One of the applications I work on deals with a lot of geographical data; 32Gb database, one table having 22 million rows. Certainly not the biggest I’ve heard of, but the biggest I’ve worked on.

Most of this data is point data, which is currently stored as separate Longitude and Latitude fields, which are floats, but we’re now on SQL 2008 and using the new spatial features has some advantages, noticeably speed, and to take advantage of that we need to add a column of type Geography. Next, this column needs to be updated with the existing data:

UPDATE Waypoint
SET Location = GEOGRAPHY::Point(Latitude, Longitude, 4326)
FROM Waypoint

On it’s own this doesn’t bring us much; sure we’re now using the new type and can use the built-in functions, such as working out if one point is within 100 metres of another:

DECLARE @MyLocation Geography = GEOGRAPHY::Point(StartLat, StartLong, 4326)
DECLARE @MyBufffer Geography = MyLocation.Buffer(100)

SELECT Location
FROM Waypoint
WHERE Location.Filter(@MyBuffer) = 1

On it’s own this is an improvement from our existing code, which has to calculate distances manually, but any calculations we now use would be hampered without a spatial index:

CREATE SPATIAL INDEX Location_Idx ON Waypoint(Location) USING GEOGRAPHY_GRID

Now the filtering can use the geospatial index, which brings massive performance increases. One procedure in our old code is a complex cursor based one, matching waypoints to agreed waypoints (think mountain biking competitions, where you have to follow a set route; matching points along the way allows us to ensure the route was followed, give or take those 100 metres). Given that this table has 22 million rows, the old cursor approach barely worked at all; very high CPU usage and an eternity to run meant we didn’t really use it. The new procedure, using the spatial functions and appropriate indexes, finishes in 9 seconds – an improvement of over 30,000% – and I’m pretty sure we can improve it even further; the first attempt was simply to use the spatial features. This is a monumental improvement and will revolutionise what we can do with our application, doubled by the fact that we’ll also be using the Silverlight Bing Map control.

And now to a downside of the spatial features, which is that if your define a route as a set of waypoints and use the LINESTRING feature to define the polyline of the route, the points cannot trace back over themselves. Cross yes, but not trace back. Consider the following simplistic set of points:

0, 0
0, 10
0, 5

This describes a trace back situation, which may be perfectly valid; cycle to a marker and cycle back. Depending upon how close the forward and back routes are, the GPS points might give the above results. In fact, we have the above data issue for three points of a stationary cycle, where the GPS drift has caused the data, albeit at the 6th decimal point, to give two different readings, but ones that case a trace back. For our application, and possibly any application using GPS units, we cannot therefore use a LINESTRING; you can’t event create the Geography type with that data, because the data is “an invalid geography type”.  Why would we need this? Creating polylines allows the entire set of points to be treated as a single type for calculations, such as seeing if a point is within X metres of the route, or if one route crosses another. Without the polyline we have to iterate the points individually.

The Geometry type allows the type to be created, and I’ve yet to work out whether whether we could use this type as a fake Geography type. The Geography type has built in knowledge of the Earth and takes into account the spherical (oblate spheriod in fact) nature of the Earth in these calculations. The question I need to answer is whether the difference between using the two types would be large enough to render our calculations invalid. It may be a route (pun intended) I’m forced to use. In the meantime, I’ve logged a suggestion on Microsoft Connect asking for the rules on LINESTRING usage to be relaxed, but even if they are relaxed, it won’t show up until the next version of the product. In the meantime, I can continue to improve our product by replacing the manual calculations with the spatial ones.