> For our geo-search API, we used PostgreSQL for many months, but once our M...

awj · on July 10, 2012

It probably was the database sharding. If the Solr setup could handle the geo-search-related data without the need for sharding it probably can beat out Postgres with sharding.

Having this exposed through an api that is standardized and maintained by someone else is also nothing to sneeze at. I'd trade a bit of performance for that kind of standardization and turnkey use in the right scenario.

rbranson · on July 10, 2012

The reason we use Solr for this specific task is because PostgreSQL cannot efficiently and quickly merge two index queries (time & space). It can do this to a limited degree, but both of these dimensions potentially match 10s of millions of documents, and PG falls over at this.

timaelliott · on July 10, 2012

So you make the r-tree 3 dimensions (lat,lng,time). PostgreSQL supports this.

I dunno I can't envision Solr being more efficient than a properly designed RDBMS for these situations. If you were integrating a full-text search I'd absolutely believe that to be the case but...

rbranson · on July 10, 2012

We need independent time & geo searches as well. The indexes are vastly smaller in Solr. We use PostgreSQL extensively and prefer it, so it's not a matter of simply wanting to use something different.

fdr · on July 11, 2012

That's very interesting. Could you share your story with the mailing list pgsql-hackers a little bit? The guys who work on indexing are quite active on those lists.

Also, there's some new thing I don't understand super well, sp-gist, do you have any thoughts on that?

michaelt · on July 10, 2012

I'm no Solr expert, but bug SOLR2155 has a patch [1] that does a geospatial search using geohash prefixes [2].

As far as I can tell, you take the point's latitude and longitude and interleave the binary bits - so if your record's latitude is 11111111 and your longitude is 10000000 your geohash is 1110101010101010. You index on that, then when you do a spatial search for the point nearest to 11111110,10000011 you look up key 1110101010101101 and a prefix search finds the closest value in the index is the the record you inserted earlier. Presumably then you realize there could be an even closer record at 11111111,01111111 which would have got stuck at 1011111111111111 in the index so you look there too just in case, take the closer of the two search results, and bob's your mother's brother.

[1] https://issues.apache.org/jira/browse/SOLR-2155?focusedComme... [2] http://en.wikipedia.org/wiki/Geohash

timaelliott · on July 10, 2012

Hrm, so for a proximity search it basically has to take a combination of all potential encompassing geohashes and then do a second-pass (substantially reduced data set) using a haversine approach or something.

I suppose that might work pretty well.