There are three big changes in the Tiger geocoder coming in 2.1, which of course you can enjoy now but I'll mostly focus on the ability
to install it as an extension since that's the one I think most people will appreciate.
- Upgraded to download and load Tiger 2012 data.
- Less greedy loading. In 2.0 it would just download all the tiger data for a state even for tables it did not use. In the 2.1 version we rewrote
to just download the relevant tables defined in the table
tiger.loader_tables
. So it is in essence a fairly table driven loader
that uses SQL to build the commandline script.
- Last but not least, ability to install it like a PostgreSQL extension if you are using PostgreSQL 9.1+. This is what we'll focus on for the rest of this article.
Brief whirlwind history of PostGIS Tiger Geocoder
PostGIS Tiger geocoder is a packaged PostGIS extra that utilizes US Census Tiger data for geocoding.
As such, it probably only has utility for people dealing with US data. Aside from being US-centric, its other key differentiator from most other PostGIS geocoders is that it's a pure
PostgreSQL play. Everything runs in the database, callable in queries, and only dependencies are PostGIS and fuzzystrmatch
(PostgreSQL fuzzy string match extension that comes with most PostgreSQL packages). This makes it easy to wrap anything around it and a drop in tool for most apps. We for example created a .NET web service with 5 lines of VB.NET code
that does nothing but connect to the database, passes the address to the geocode function, and gets back the result. For most other uses we just use it as an in-db batch geocoder.
It has a long history, first created by Refractions, then stagnant for a while when
US Census made major structural changes switching to ESRI shape format and restructuring things. Then it got upgraded by Stephen
Frost et. al to utilize Tiger 2008 new structure and ESRI shapefile. Then we picked it up from there and have been doing much of enhancements and maintenance since mostly
funded through client work.
- First facelift we did was changing the structure to use inheritance so each state's data could be loaded as needed and attached/detached from the hierarchy
for easier maintenance (like a tire replacement).
- Then redoing much of the query logic, adding indexes to improve speed and improve address normalizer.
- Adding how to use and descriptions and usage of functions in the official PostGIS docs.
- Adding reverse geocoder function
- Lastly including as part of it a loader generation and maintenance SQL functions that generates a platform specific loading commandline script to download the data from US Census, stage it,
and load it into final tables. For most Linux/Unix installs the unzip, wget, and shp2pgsql dependencies are already present and for Windows,
equivalent tools are an easy 10 minute download/install away. Maintenance functions scan the table heirarchy and add indexes where needed.
There has been a surprising amount of interest in it, perhaps because Google and others have started charging a non-trivial amount of money
for bulk geocoding services. When you've got to geocode thousands of US addresses to geocode, it's an appealing option. One plus means more
people stress testing it reporting bugs, which means (we get free testing for our consulting work). Getting good QA testers that catch issues before your clients do is not cheap.
This encourages us to refine and on the bad side means more people reporting bugs and expecting us to do something about them.
It's the classic curse of open source, you build something people want to use; give it away; and people expect you to make it better.
Getting the Extension
The extension will work fine for PostGIS 2.0 and above, so if you have PostGIS 2.0, you can still enjoy it. If you are on Windows and running PostgreSQL 9.1 (x64) or
PostgreSQL 9.2 (x32, x64), you can download the latest extension package from Winnie Experimental Builds. In theory
you can also use the buildbot extension files if you are on Unix since the extension is written in pure plpgsql, but I fear the windows line breaks may cause problems. I'd be interested to know if people on Unix/Linux
are successful. If not I'll force the buildbot line breaks so all camps can use the pre-built extension files.
If you are compiling PostGIS 2.1 yourself, as long as you compile with raster support and PostgreSQL 9.1+ then the tiger geocoder extension will build as well
and should install as well or you can manually copy the (sql/*
and postgis_tiger_geocoder.control
files from extensions/postgis_tiger_geocoder
folder.
The extension files for windows zip file are located in share/extension
folder and to make available as an extension just copy the postgis_tiger_geocoder*
files
into your PostgreSQL share/extension
install folder.
Install postgis_tiger_geocoder Extension
I've got this all documented in the Official PostGIS manual under
Installing PostGIS Tiger Geocoder Extension.
Some of this will be a rehash of the documentation.
- If you are using pgAdmin and you copied the extension files in your PostgreSQL install, you should see the extension listed along side the other growing postgis family of extensions.
.
- You need to install the postgis_tiger_geocoder extension in a database that already has
postgis
and fuzzystrmatch
extensions installed. If you simply
try to install the extension with these e.g. using CREATE EXTENSION postgis_tiger_geocoder
or selecting from the PgAdmin extension menu, you'll get a warning something like this:
.
- So in short if you are installing on a completely new database, you'll need:
CREATE EXTENSION postgis;
CREATE EXTENSION fuzzystrmatch;
CREATE EXTENSION postgis_tiger_geocoder;
- The extension install automatically adds
tiger
schema to the database search_path. So to use you need to first disconnect and then reconnect
so you get the new search path settings.
- The normalizer should more or less work out of the gate except for some cases like when State names and street names are same. Then you'll need data.
To test run this.
SELECT 'foss4g2013' As event, n.*
FROM normalize_address('30 South 7th Street, Minneapolis, MN 55402') As n;
-- which outputs --
event | address | predirabbrev | streetname | streettypeabbrev | .. | .. | location | stateabbrev | zip | .
-----------+---------+--------------+------------+------------------+----+----+-------------+-------------+-------+---
foss4g2013 | 30 | S | 7th | St | .. | ..| Minneapolis | MN | 55402 | .
- To Load data you need to run Refer to Loading Tiger data
Basic steps
- generate national script to load basic state, zip, county boundaries -- Loader_Generate_Nation_Script.html. Then run the script on the server.
- Then for states you want to be able to geocode -- you build command line script using function: Loader_Generate_Script. Then run the generated script on commandline of server
- Since states are detachable and reconnectable, you can do all the loading on a separate server and then just restore the state tables you want by backing up and restoring on another server. The data by default is housed in a schema called
tiger_data
.
- From then you should be able to use the Geocode, Geocode_Intersection and Reverse_Geocode functions
We must say that the Geocode function is a poster child for the new lateral and particularly LEFT LATERAL functionality coming in PostgreSQL 9.3 we discussed. More on that later.
Final note: Future plans
We don't currently have in place the Update logic, so you can't yet do interim upgrades:
ALTER EXTENSION postgis_tiger_geocoder UPDATE TO "2.1.0SVNnext";
like you can with the postgis
and postgis_topology
extensions, but we plan to have these in place before 2.1 release.
I'm also hoping to experiment with Stephen Woodbridge's PAGC normalizer once I can get the darn thing to compile under my mingw environment and work with 9.1 and 9.2 PostgreSQL. If all goes well with that package that as a drop in replacement
for the built-in postgis_tiger_geocoder normalizer. The normalizer of course requires compiling, so I will only be able to provide binaries for windows and possibly Debian once we have our Debian build bot in shape. All that may not come in 2.1 though. More on that later as we make progress with that.