Twofishes is a
- coarse (ie, not a street geocoder, it goes down to the city/neighborhood/poi level)
- splitting (it can break up "pizza new york" into query: pizza, where: new york)
- geocoder (it translates from string queries to lat/lngs on the earth)
- based primarily on geonames data
FeaturesPlease note that this is the unpolished developer debug interface for twofishes
- Coarse (city/poi) forward geocoding
- Coarse point reverse geocoding
- Coarse region reverse geocoding
- Query splitting
- Geocode autocomplete with highlighting
- Written in finagle + scala
- Speaks thrift + json
- Used extensively in production at foursquare
- Serves 1000s of qps, <10ms response time for most queries, faster for point reverse geocoding
Getting startedTwofishes requires a relatively beefy box to build an index from scratch. For that reason, prebuilt indexes and binaries exist!
Prebuilt indexesAll indexes contain data from geonames.org (light cc-by), flickr (public domain), and natural earth (public domain). Download server binary (version 0.82.14, 2013-05-01) Download latest index (updated 2014-03-21 -- includes 1000s of neighborhoods from click-that-hood, not including zillow sourced hoods)
Download latest OSM based index (updated 2014-04-12 -- Requires OSM attribution and license compliance, please see OSM legal faq
License: CC-BY as required by geonames, please link to geonames.org somewhere on your site. For more information, see: geonames.org (light cc-by) License for this data. You are required to include this information on your site if you use the twofishes index
Serving a prebuilt indexDownload and unzip one of the indexes, download the server binary.
The server's debug/json inteface will be accessible at http://localhost:8081/static/geocoder.html You can also serve a prebuilt binary by following the instructions below and skipping the index build step -- run serve.py with an argument of the index directory.
java -jar server-assembly-0.81.9.jar --hfile_basepath INDEX_DIRECTORY
Building an index from scratchSee the documentation on github but, very very quickly:
git clone https://github.com/foursquare/twofishes.git cd twofishes ./init.sh ./download-country.sh [COUNTRY_CODE] or ./download-world.sh ./parse.py -c [COUNTRY_CODE] or ./parse.py -w # maybe you also want --output_prefix_index and/or --output_revgeo_index ./serve.py latest
Mailing ListSign up for the google group to get updates about new indexes, breaking changes, and, hopefully, discussion of features and development.
Why not street level geocoding?Street level geocoding is hard. It's a lot of data (and as a result, the iteration speed is very slow -- a full osm geocoder build takes 1-2 weeks) and it requires a lot of per-country localization. Also, OSM data for address geocoding isn't quite there yet. Building a new OSM street geocoder is something I would like to do, but don't have the time or need for it right now. Additionally, street level geocoding is a relatively solved problem by various commercial services, but coarse geocoding strangely isn't. Many of the coarse geocoders I found had poor ranking, no splitting, no autocomplete and no stable identifiers. And no polygons either.
Where did the name come from?
Twofishes started life as a 100 line python script serving out of mongo that I began writing one night in December 2011. December was cold. Our stove was being repaired. Still, there were fish to cook. Two fishes. Two fishes wrapped in bacon a la Papa Hemingway.The fish up top are from a very dated 1950s cookbook called "Recipes from the East" by Irma Walker Ross.