There are three big changes in the Tiger geocoder coming in 2.1, which of course you can enjoy now but I'll mostly focus on the ability
to install it as an extension since that's the one I think most people will appreciate.
- Upgraded to download and load Tiger 2012 data.
- Less greedy loading. In 2.0 it would just download all the tiger data for a state even for tables it did not use. In the 2.1 version we rewrote
to just download the relevant tables defined in the table
tiger.loader_tables
. So it is in essence a fairly table driven loader
that uses SQL to build the commandline script.
- Last but not least, ability to install it like a PostgreSQL extension if you are using PostgreSQL 9.1+. This is what we'll focus on for the rest of this article.
Brief whirlwind history of PostGIS Tiger Geocoder
PostGIS Tiger geocoder is a packaged PostGIS extra that utilizes US Census Tiger data for geocoding.
As such, it probably only has utility for people dealing with US data. Aside from being US-centric, its other key differentiator from most other PostGIS geocoders is that it's a pure
PostgreSQL play. Everything runs in the database, callable in queries, and only dependencies are PostGIS and fuzzystrmatch
(PostgreSQL fuzzy string match extension that comes with most PostgreSQL packages). This makes it easy to wrap anything around it and a drop in tool for most apps. We for example created a .NET web service with 5 lines of VB.NET code
that does nothing but connect to the database, passes the address to the geocode function, and gets back the result. For most other uses we just use it as an in-db batch geocoder.
It has a long history, first created by Refractions, then stagnant for a while when
US Census made major structural changes switching to ESRI shape format and restructuring things. Then it got upgraded by Stephen
Frost et. al to utilize Tiger 2008 new structure and ESRI shapefile. Then we picked it up from there and have been doing much of enhancements and maintenance since mostly
funded through client work.
- First facelift we did was changing the structure to use inheritance so each state's data could be loaded as needed and attached/detached from the hierarchy
for easier maintenance (like a tire replacement).
- Then redoing much of the query logic, adding indexes to improve speed and improve address normalizer.
- Adding how to use and descriptions and usage of functions in the official PostGIS docs.
- Adding reverse geocoder function
- Lastly including as part of it a loader generation and maintenance SQL functions that generates a platform specific loading commandline script to download the data from US Census, stage it,
and load it into final tables. For most Linux/Unix installs the unzip, wget, and shp2pgsql dependencies are already present and for Windows,
equivalent tools are an easy 10 minute download/install away. Maintenance functions scan the table heirarchy and add indexes where needed.
There has been a surprising amount of interest in it, perhaps because Google and others have started charging a non-trivial amount of money
for bulk geocoding services. When you've got to geocode thousands of US addresses to geocode, it's an appealing option. One plus means more
people stress testing it reporting bugs, which means (we get free testing for our consulting work). Getting good QA testers that catch issues before your clients do is not cheap.
This encourages us to refine and on the bad side means more people reporting bugs and expecting us to do something about them.
It's the classic curse of open source, you build something people want to use; give it away; and people expect you to make it better.
Continue reading "Waiting for PostGIS 2.1 - Install PostGIS Tiger Geocoder as an Extension"