I also thought "wouldn't it be nice if PostGIS had a Waiting for PostGIS ... " where you can uncover what's coming and more efficient ways of doing older work processes with newer PostGIS constructs. So here is my first experiment with that concept. Recall Paul was all excited about how he's made Geography faster. Specifically ST_Distance, ST_DWithin, and ST_Intersects for geography have gotten a facelift in upcoming PostGIS 2.1. So we thought we'd try it with a use case we had recently where the geography performance was less than spectacular. What timezone is each airport located in? so that we can later use PostgreSQL wealth of timezone aware functions. It is your classic point in huge multipolygon problem.
The datasets
For datasets we used:
Both are in WGS 1984 longlat aka (SRID:4326) so perfect for straight loading into geography. I've become even shorter on memory and patience in my old age, so I'm just using the postgis-gui to load these in. Remember if you use the GUI , click the Options and choose Load into geography as shown here:
The Geography speed test
We have PostgreSQL 9.1 32-bit running on windows. One database has PostGIS 2.1.0SVN r10217 and the other database has PostGIS 2.0.1.
CREATE DATABASE test_geog_postgis20;
After connecting to test_geog_postgis20, run
CREATE EXTENSION postgis VERSION 2.0.1;
CREATE DATABASE test_geog_postgis21;
After connecting to test_geog_postgis21, run
CREATE EXTENSION postgis VERSION 2.1.0SVN;
Since they are on the same instance, they share the same GEOS 3.4.0 svn and Proj 4.8.0. Now for the proof. I loaded both databases with the aforementioned datasets and ran this script.
ALTER TABLE airports ADD COLUMN tz_name varchar(30);
vacuum analyze verbose tz_world_mp;
vacuum analyze verbose airports;
-- (on 2.1.0SVN r10217) 67,004ms ~1.05 minutes (19,671 records updated)
-- on 2.0.1 hmm patiently waiting 7,721,566 ms (~2.14 hours) 19,671 records
UPDATE airports SET tz_name = w.tzid
FROM tz_world_mp AS w
WHERE ST_Intersects(w.geog, airports.geog);
You'll notice that though we have 19,922, not all of them got tagged. A quick spot check of those were SEA PLANES and HELIPORTS out in water or very close to water boundaries and the timezone data as stipulated in the info says it doesn't cover water. So most are expected. Some that actually were on land were missed but they were close to water boundaries so a simple ST_DWithin might have fixed those.
Using DbLink
Hopefully from the above, you can infer who the speed winner is, BUT do they yield same answer? Here dblink comes to the rescue.
I installed dblink in my PostGIS 2.1 db with:
CREATE EXTENSION dblink;
Then ran this query:
WITH ap20 AS (SELECT *
FROM dblink('dbname=test_geog_postgis20 host=localhost port=5432 user=postgres password=whatever'
,'SELECT locid, tz_name FROM airports WHERE tz_name is not null')
AS ap(locid varchar(4),tz_name varchar(30) ) )
SELECT count(ap20.locid) As tot, count(ap21.locid) As num_matches
from ap20 LEFT JOIN airports As ap21 ON (ap20.locid = ap21.locid AND ap20.tz_name = ap21.tz_name);
What about Geometry?
To test geometry we added geometry columns
ALTER TABLE airports ADD COLUMN geom geometry(POINT,4326) ;
ALTER TABLE airports ADD COLUMN tz_planar_name varchar(30);
UPDATE airports SET geom = geog::geometry;
CREATE INDEX airports_geom_gist ON airports USING gist (geom );
vacuum analyze verbose airports;
ALTER TABLE tz_world_mp ADD COLUMN geom geometry(MULTIPOLYGON,4326);
UPDATE tz_world_mp SET geom = geog::geometry;
CREATE INDEX tz_world_mp_geom_gist ON tz_world_mp USING gist (geom );
vacuum analyze verbose tz_world_mp;
Then ran our tests
-- geometry 22,182ms (3 times faster than 2.1 geography about same speed on 2.1 and 2.0.1), 19,671 records updated
UPDATE airports SET tz_planar_name = w.tzid
FROM tz_world_mp AS w
WHERE ST_Intersects(w.geom, airports.geom);
Do they match, well in this case we can Get away with using geometry. The responses for this dataset ended up being the same even if you squash the world into a planar plate carrée space so a bit anti-climactic.
Using ST_DWithin would have been much more interesting.
SELECT count(locid)
, COUNT(CASE WHEN tz_name = tz_planar_name THEN 1 ELSE NULL END) As num_matches
FROM airports
WHERE tz_name is not null;