RethinkDB 1.15: Geospatial queries
Today, we’re happy to announce RethinkDB 1.15 (Lawrence of Arabia). Download it now!
The 1.15 release includes over 50 enhancements and introduces geospatial queries to RethinkDB. This has been by far the most requested feature by RethinkDB users. In addition, we’ve sped up many queries dramatically by lazily deserializing data from disk. This release also brings a new r.uuid command that allows server-side generation of UUIDs.
Thanks primarily to Daniel Mewes, RethinkDB now has rich geospatial features including:
- r.geojson and r.to_geojson for importing and exporting GeoJSON
- Commands to create points, lines, polygons and circles
- Geospatial queries:
- get_intersecting: finds all documents that intersect with a given geometric object
- get_nearest: finds the closest documents to a point
- Geospatial indexes to make
get_intersecting
andget_nearest
blindingly fast - Functions that operate on geometry:
- r.distance: gets the distance between a point and another geometric object
- r.intersects: determines whether two geometric objects intersect
- r.includes: tests whether one geometric object is completely contained in another
- r.fill: converts a line into a polygon
- r.polygon_sub: subtracts one polygon from another
Upgrading to RethinkDB 1.15?
- If you’re upgrading from version 1.12 or earlier, you will need to migrate your data one last time.
- If you’re coming from 1.13, you don’t need to migrate your data but you may need to recreate your indexes.
Upgrading on Ubuntu? If you’re upgrading from 1.12 or earlier, first set up the new RethinkDB PPA.
Using geospatial queries
Let’s insert a couple of locations into RethinkDB:
> r.table('geo').insert([
{
'id': 1,
'name': 'San Francisco',
'location': r.point(-122.423246, 37.779388)
},
{
'id': 2,
'name': 'San Diego',
'location': r.point(-117.220406, 32.719464)
}
]).run(conn)
Throughout RethinkDB, all coordinates are entered as longitude/latitude to be consistent with GeoJSON.
In order for geospatial queries to return these points as results, we need to create a geospatial index:
> r.table('geo').createIndex('location', geo=True).run(conn)
Now, let’s find which of these cities is nearest to a given point — for example, Santa Maria, CA:
> r.table('geo').get_nearest(
r.point(-120.4333, 34.9514), # Santa Maria's long/lat
index='location',
max_dist=300,
unit='mi',
max_results=1).run(conn)
[{"doc": {
"id": 1,
"name": "San Francisco",
"location": {
"$reql_type$": "GEOMETRY",
"type": "Point",
"coordinates": [-122.423246, 37.779388] }},
"dist": 224.34241555826364 }]
We see that Santa Maria is about 224 miles from San Francisco. Note that RethinkDB returns the matched document, as well as the distance to the original point.
We can also find all geometric shapes that intersect with a polygon. This is useful when you’re given a viewing window, and need to return all geometry that’s inside the window:
def query_view_window(top, bottom, left, right):
# top and bottom are latitudes, left and right are longitudes
bounding_box = r.polygon(
r.point(left, top),
r.point(right, top),
r.point(right, bottom),
r.point(left, bottom))
return r.table('geo').get_intersecting(bounding_box, index='location').run(conn)
Going further
For the full details, read the in-depth article on geospatial support by Watts Martin.
In addition, check out an example web application that uses RethinkDB to dynamically load street maps and points of interest.
Faster queries
Prior to the 1.15 release, every time a query touched a document RethinkDB would pull the entire document from disk and deserialize it into a full ReQL data structure in memory.
In RethinkDB 1.15, the database intelligently deserializes only portions of the
document when they become necessary. If a field isn’t required by the query,
RethinkDB no longer spends time looking at it. This speeds up queries that only
need part of a document, most notoriously count
.
You should see performance increases for:
- analytic queries which only need summary information
- queries which don’t touch every part of a document.
In our (unscientific) tests, we saw performance improvements of around 15% for
simple read queries, ×2 for analytic queries, and ×50 for count
queries on tables. We’ll be publishing scientific benchmarks soon, but in the
meantime, enjoy the better performance!
Server-side UUIDs
The new r.uuid
command lets you generate server-side UUIDs wherever you like.
Let’s say that when you create a new player they get a default item in their inventory. Additionally, each item needs a unique identifier:
> r.table("player").insert({
"name": player_name,
"inventory": [{
"item_type": "potion",
"item_id": r.uuid(),
}]
}, non_atomic=True, return_changes=True).run(conn)
This will return:
{ "inserted": 1,
"changes": [
{
"new_val": {
"id": "063ab596-543e-45a7-904f-c3fafa96bf42",
"name": "for_my_friends",
"inventory": [{
"item_type": "potion",
"item_id": "e985d732-c2ac-40a4-bf19-9b4946632859",
}]
},
"old_val": null
}
}
RethinkDB has always created a UUID automatically for the primary key if it
isn’t specified in the inserted document, but now we can generate UUIDs for
embedded documents as well. You can get the generated keys by using
return_changes=True
.
Since UUID generation is random (and therefore can’t be done atomically),
you’ll need to add the non_atomic=True
flag to any update or insert that uses
r.uuid
.
Next steps
See the full list of enhancements, and take the new release for a spin!
The team is already hard at work on the upcoming 1.16 release that will focus on more flexible changefeeds. As always, if there is something you’d like us to prioritize or if you have any feedback on the release, please let us know!
Help work on the 1.16 release: RethinkDB is hiring.