Building an earthquake map with RethinkDB and GeoJSON

RethinkDB 1.15 introduced new geospatial features that can help you plot a course for smarter location-based applications. The database has new geographical types, including points, lines, and polygons. Geospatial queries makes it easy to compute the distance between points, detect intersecting regions, and more. RethinkDB stores geographical types in a format that conforms with the GeoJSON standard.

Developers can take advantage of the new geospatial support to simplify the development of a wide range of potential applications, from location-aware mobile experiences to specialized GIS research platforms. This tutorial demonstrates how to build an earthquake map using RethinkDB's new geospatial support and an open data feed hosted by the USGS.

Fetch and process the earthquake data

The USGS publishes a global feed that includes data about every earthquake detected over the past 30 days. The feed is updated with the latest earthquakes every 15 minutes. This tutorial uses a version of the feed that only includes earthquakes that have a magnitude of 2.5 or higher.

In the RethinkDB administrative console, use the r.http command to fetch the data:

r.http("http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_month.geojson")

The feed includes an array of geographical points that represent earthquake epicenters. Each point comes with additional metadata, such as the magnitude and time of the associated seismic event. You can see a sample earthquake record below:

{
  id: "ak11383733",
  type: "Feature",
  properties: {
    mag: 3.3,
    place: "152km NNE of Cape Yakataga, Alaska",
    time: 1410213468000,
    updated: 1410215418958,
    ...
  },
  geometry: {
    type: "Point",
    coordinates: [-141.1103, 61.2728, 6.7]
  }
}

The next step is transforming the data and inserting it into a table. In cases where you have raw GeoJSON data, you can typically just wrap it with the r.geojson command to convert it into native geographical types. The USGS earthquake data, however, uses a non-standard triple value for coordinates, which isn't supported by RethinkDB. In such cases, or in situations where you have coordinates that are not in standard GeoJSON notation, you will typically use commands like r.point and r.polygon to create geographical types.

Using the merge command, you can iterate over earthquake records from the USGS feed and replace the value of the geometry property with an actual point object. The output of the merge command can be passed directly to the insert command on the table where you want to store the data:

r.table("quakes").insert(
  r.http("earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_month.geojson")("features")
    .merge(function(quake) {
      return {
        geometry: r.point(
          quake("geometry")("coordinates")(0),
          quake("geometry")("coordinates")(1))
      }
    })
  )

The r.point command takes longitude as the first parameter and latitude as the second parameter, just like GeoJSON coordinate arrays. In the example above, the r.point command is passed the coordinate values from the earthquake object's geometry property.

As you can see, it's easy to load content from remote data sources into RethinkDB. You can even use the query language to perform relatively sophisticated data transformations on the fetched data before inserting it into a table.

Perform geospatial queries

The next step is to create an index on the geometry property. Use the indexCreate command with the geo option to create an index that supports geospatial queries:

r.table("quakes").indexCreate("geometry", {geo: true})

Now that there is an index, try querying the data. For the first query, try fetching a list of all the earthquakes that took place within 200 miles of Tokyo:

r.table('quakes').getIntersecting(
  r.circle([139.69, 35.68], 200,
    {unit: "mi"}), {index: "geometry"})

In the example above, the getIntersecting command will find all of the records in the quakes table that have a geographic object stored in the geometry property that intersects with the specified circle. The r.circle command creates a polygon that approximates a circle with the desired radius and center point. The unit option tells the r.circle command to use a particular unit of measurement (miles, in this case) to compute the radius. The coordinates used in the above example correspond with the latitude and longitude of Tokyo.

Let's say that you wanted to get the largest earthquake for each individual day. To organize the earthquakes by day, use the group command on the date. To get the largest from each day, you can chain the max command and have it operate on the magnitude property.

r.table("quakes").group(r.epochTime(
    r.row("properties")("time").div(1000)).date())
  .max(r.row("properties")("mag"))

The USGS data uses timestamps that are counted in milliseconds since the UNIX epoch. In the query above, div(1000) is used to normalize the value so that it can be interpreted by the r.epochTime command. It's also worth noting that commands chained after a group operation will automatically be performed on the contents of each individual group.

Build a simple API backend

The earthquake map application has a simple backend built with node.js and Express. It implements several API endpoints that client applications can access to fetch data. Create a /quakes endpoint, which returns a list of earthquakes ordered by magnitude:

var r = require("rethinkdb");
var express = require("express");

var app = express();
app.use(express.static(__dirname + "/public"));

var configDatabase = {
  db: "quake",
  host: "localhost",
  port: 28015
}

app.get("/quakes", function(req, res) {
  r.connect(configDatabase).then(function(conn) {
    this.conn = conn;

    return r.table("quakes").orderBy(
      r.desc(r.row("properties")("mag"))).run(conn);
  })
  .then(function(cursor) { return cursor.toArray(); })
  .then(function(result) { res.json(result); })
  .finally(function() {
    if (this.conn)
      this.conn.close();
  });
});

app.listen(8081);

Add an endpoint called /nearest, which will take latitude and longitude values passed as URL query parameters and return the earthquake that is closest to the provided coordinates:

app.get("/nearest", function(req, res) {
  var latitude = req.param("latitude");
  var longitude = req.param("longitude");

  if (!latitude || !longitude)
    return res.json({err: "Invalid Point"});

  r.connect(configDatabase).then(function(conn) {
    this.conn = conn;

    return r.table("quakes").getNearest(
      r.point(parseFloat(longitude), parseFloat(latitude)),
      { index: "geometry", unit: "mi" }).run(conn);
  })
  .then(function(result) { res.json(result); })
  .finally(function(result) {
    if (this.conn)
      this.conn.close();
  });
});

The r.point command in the code above is given the latitude and longitude values that the user included in the URL query. Because URL query parameters are strings, you need to use the pareFloat function (or a plus sign prefix) to coerce them into numbers. The query is performed against the geometry index.

In addition to returning the closest item, the getNearest command also returns the distance. When using the unit option in the getNearest command, the distance is converted into the desired unit of measurement.

Build a frontend with AngularJS and leaflet

The earthquake application's frontend is built with AngularJS, a popular JavaScript MVC framework. The map is implemented with the Leaflet library and uses tiles provided by the OpenStreetMap project.

Using the AngularJS $http service, retrieve the JSON quake list from the node.js backend, create a map marker for each earthquake, and assign the array of earthquake objects to a variable in the current scope:

$scope.fetchQuakes = function() {
  $http.get("/quakes").success(function(quakes) {
    for (var i in quakes)
      quakes[i].marker = L.circleMarker(L.latLng(
        quakes[i].place.coordinates[1],
        quakes[i].place.coordinates[0]), {
        radius: quakes[i].properties.mag * 2,
        fillColor: "#616161", color: "#616161"
      });

    $scope.quakes = quakes;
  });
};

To display the points on the map, use Angular's $watchCollection to apply or remove markers as needed when a change is observed in the contents of the quakes array.

$scope.map = L.map("map").setView([0, 0], 2);
$scope.map.addLayer(L.tileLayer(mapTiles, {attribution: mapAttrib}));

$scope.$watchCollection("quakes",
  function(addItems, removeItems) {
    if (removeItems && removeItems.length)
      for (var i in removeItems)
        $scope.map.removeLayer(removeItems[i].marker);

    if (addItems && addItems.length)
      for (var i in addItems)
        $scope.map.addLayer(addItems[i].marker);
  }
);

You could just call $scope.map.addLayer in the fetchQuakes method to add markers directly as they are created, but using $watchCollection is more idiomatically appropriate for AngularJS---if the application adds or removes items from the array later, it will dynamically add or remove the corresponding place markers on the map.

The application also displays a sidebar with a list of earthquakes. Clicking on an item in the list will focus the associated point on the map. That part of the application was relatively straightforward, built with a simple ng-repeat that binds to the quakes array.

To complete the application, the last feature to add is support for plotting the user's own location on the map and indicating which earthquake in the list is the closest to their position.

The HTML5 Geolocation standard introduced a browser method called geolocation.getCurrentPosition that provides coordinates of the user's current location. In the callback for that method, assign the received coordinates to the userLocation variable in the current scope. Next, use the $http service to send the coordinates to the /nearest endpoint.

$scope.updateUserLocation = function() {
  navigator.geolocation.getCurrentPosition(function(position) {
    $scope.userLocation = position.coords;

    $http.get("/nearest", {params: position.coords})
      .success(function(output) {
        if (output.length)
          $scope.nearest = output[0].doc;
      });
  });
};

To display the user's position on the map, use $watch to observe for changes to the value of userLocation. When it changes, create a new place marker at the user's coordinates.

$scope.$watch("userLocation", function(newVal, oldVal) {
  if (!newVal) return;

  if ($scope.userMarker)
    $scope.map.removeLayer($scope.userMarker);

  var point = L.latLng(newVal.latitude, newVal.longitude);
  $scope.userMarker = L.marker(point, {
    icon: L.icon({iconUrl: "mark.png"})
  });

  $scope.map.addLayer($scope.userMarker);
});

Put a pin in it

To view the complete source code, you can check out the repository on GitHub. To try the example, run npm install in the root directory and then execute the application by running node app.js.

To learn more about using geospatial queries in RethinkDB, check out the documentation. Geospatial support is only one of the great new features introduced in RethinkDB 1.15. Be sure to read the release announcement to get the whole story.

RethinkDB 1.15: Geospatial queries

Today, we're happy to announce RethinkDB 1.15 (). Download it now!

The 1.15 release includes over 50 enhancements and introduces geospatial queries to RethinkDB. This has been by far the most requested feature by RethinkDB users. In addition, we've sped up many queries dramatically by lazily deserializing data from disk. This release also brings a new r.uuid command that allows server-side generation of UUIDs.

Thanks primarily to Daniel Mewes, RethinkDB now has rich geospatial features including:

  • r.geojson and r.to_geojson for importing and exporting GeoJSON
  • Commands to create points, lines, polygons and circles
  • Geospatial queries:
    • get_intersecting: finds all documents that intersect with a given geometric object
    • get_nearest: finds the closest documents to a point
  • Geospatial indexes to make get_intersecting and get_nearest blindingly fast
  • Functions that operate on geometry:
    • r.distance: gets the distance between a point and another geometric object
    • r.intersects: determines whether two geometric objects intersect
    • r.includes: tests whether one geometric object is completely contained in another
    • r.fill: converts a line into a polygon
    • r.polygon_sub: subtracts one polygon from another

If you're upgrading from version 1.12 or earlier, you will need to migrate your data one last time.

If you're coming from 1.13, you don't need to migrate your data but you may need to recreate your indexes.

Upgrading on Ubuntu? If you're upgrading from 1.12 or earlier, first set up the new RethinkDB PPA.

Using geospatial queries

Let's insert a couple of locations into RethinkDB:

> r.table('geo').insert([
  {
    'id': 1,
    'name': 'San Francisco',
    'location': r.point(-122.423246, 37.779388)
  },
  {
    'id': 2,
    'name': 'San Diego',
    'location': r.point(-117.220406, 32.719464)
  }
]).run(conn)

Throughout RethinkDB, all coordinates are entered as longitude/latitude to be consistent with GeoJSON.

In order for geospatial queries to return these points as results, we need to create a geospatial index:

> r.table('geo').createIndex('location', geo=True).run(conn)

Now, let's find which of these cities is nearest to a given point — for example, Santa Maria, CA:

> r.table('geo').get_nearest(
    r.point(-120.4333, 34.9514),  # Santa Maria's long/lat
    index='location',
    max_dist=300,
    unit='mi',
    max_results=1).run(conn)

[{"doc": {
    "id": 1,
    "name": "San Francisco",
    "location": {
      "$reql_type$": "GEOMETRY",
      "type": "Point",
      "coordinates": [-122.423246, 37.779388] }},
  "dist": 224.34241555826364 }]

We see that Santa Maria is about 224 miles from San Francisco. Note that RethinkDB returns the matched document, as well as the distance to the original point.

We can also find all geometric shapes that intersect with a polygon. This is useful when you're given a viewing window, and need to return all geometry that's inside the window:

def query_view_window(top, bottom, left, right):
    # top and bottom are latitudes, left and right are longitudes
    bounding_box = r.polygon(
        r.point(left, top),
        r.point(right, top),
        r.point(right, bottom),
        r.point(left, bottom))
    return r.table('geo').get_intersecting(bounding_box, index='location').run(conn)

Going further

For the full details, read the in-depth article on geospatial support by Watts Martin.

In addition, check out an example web application that uses RethinkDB to dynamically load street maps and points of interest.

Faster queries

Prior to the 1.15 release, every time a query touched a document RethinkDB would pull the entire document from disk and deserialize it into a full ReQL data structure in memory.

In RethinkDB 1.15, the database intelligently deserializes only portions of the document when they become necessary. If a field isn't required by the query, RethinkDB no longer spends time looking at it. This speeds up queries that only need part of a document, most notoriously count.

You should see performance increases for:

  • analytic queries which only need summary information
  • queries which don't touch every part of a document.

In our (unscientific) tests, we saw performance improvements of around 15% for simple read queries, ×2 for analytic queries, and ×50 for count queries on tables. We'll be publishing scientific benchmarks soon, but in the meantime, enjoy the better performance!

Server-side UUIDs

The new r.uuid command lets you generate server-side UUIDs wherever you like.

Let's say that when you create a new player they get a default item in their inventory. Additionally, each item needs a unique identifier:

> r.table("player").insert({
      "name": player_name,
      "inventory": [{
          "item_type": "potion",
          "item_id": r.uuid(),
      }]
  }, non_atomic=True, return_changes=True).run(conn)

This will return:

{ "inserted": 1,
  "changes": [
    {
      "new_val": {
        "id": "063ab596-543e-45a7-904f-c3fafa96bf42",
        "name": "for_my_friends",
        "inventory": [{
          "item_type": "potion",
          "item_id": "e985d732-c2ac-40a4-bf19-9b4946632859",
        }]
      },
      "old_val": null
    }
}

RethinkDB has always created a UUID automatically for the primary key if it isn't specified in the inserted document, but now we can generate UUIDs for embedded documents as well. You can get the generated keys by using return_changes=True.

Since UUID generation is random (and therefore can't be done atomically), you'll need to add the non_atomic=True flag to any update or insert that uses r.uuid.

Next steps

See the full list of enhancements, and take the new release for a spin!

The team is already hard at work on the upcoming 1.16 release that will focus on more flexible changefeeds. As always, if there is something you'd like us to prioritize or if you have any feedback on the release, please let us know!

Help work on the 1.16 release: RethinkDB is hiring.

Publish and subscribe entirely in RethinkDB

With RethinkDB's changefeeds, it's easy to create a publish-subscribe message exchange without going through a third-party queue. Josh Kuhn (@deontologician) has written a small library, repubsub, that shows you how to build topic exchanges—and he's written it in all three of our officially-supported languages. He's put together a terrific tutorial article demonstrating how to use it. You can simply create a topic and publish messages to it:

topic = exchange.topic('fights.superheroes.batman')
topic.publish({'opponent': 'Joker', 'victory': True})

Then subscribe to just the messages that match your interest.

filter_func = lambda topic: topic.match(r'fights\.superheroes.*')
queue = exchange.queue(filter_func)
for topic, payload in queue.subscription:
    print topic, payload

Josh describes how to implement tags, nested topics and more, so check out the publish-subscribe tutorial.

September Events in San Francisco

Join the RethinkDB team at the following events for September 2014:

Meetup: RethinkDB SF Group at Heavybit Industries

Thursday, September 11th at 6pm, Heavybit Industries, 325 Ninth Street (map)

RethinkDB co-founder, Slava Akhmechet, will be talking about the latest RethinkDB advances and where it's headed; come meet the founders & engineers!
Get architectural advice, improve your code, give the RethinkDB team product feedback, and catch a sneak peek of upcoming features. Talks will start at 7pm. Food and drinks provided.

RSVP here

Office Hours with Slava Akhmechet

Tuesday, September 16th from 11am - 4pm, Workshop Cafe, 180 Montgomery Street #100 (map)

Sign up to get one-on-one RethinkDB support with Slava Akhmechet during our office hours in San Francisco.
Learn how to get up and running with RethinkDB, get individual support on your project, or just enjoy a cup of coffee with us!
We have five (5) 45 minute time slots available (11am, 12pm, 1pm, 2pm, & 3pm).
Please contact christina@rethinkdb.com to reserve your time.

Meetup event here

Meetup: Building realtime apps with RethinkDB

Monday, September 22nd at 6:30pm, PubNub, 725 Folsom St (map)

RethinkDB has teamed up with DevBrill for their September meetup.
Slava will demo RethinkDB, show how to get started with storing and querying JSON data in RethinkDB, and how to scale and parallelize queries across multiple machines.
The talk will also include the highlights of some of the more unique features like 'r.http' and changefeeds, an overview of the tradeoffs involved in building a distributed architecture, and a discussion on the future of realtime applications.
Talks will start at 7pm. Food and drinks provided.

RSVP here

If you have questions or would like to speak at any of our events, please contact christina@rethinkdb.com.

RethinkDB 1.14: binary data, seamless migration, and Python 3 support

Today, we're happy to announce RethinkDB 1.14 (). Download it now!

The 1.14 release includes over 50 enhancements including:

  • Seamless migration (read more below)
  • Simple binary data support
  • Python 3 support
  • Support for returning changes from multiple writes
  • Better documentation
  • New options for handling conflicts on inserts
  • Dozens of stability and performance improvements

Upgrading to 1.14? You no longer need to migrate your data between point releases! Read below for more information.

If you're upgrading from version 1.12 or earlier, you will need to migrate your data one last time.

Upgrading on Ubuntu? If you're upgrading from 1.12 or earlier, first set up the new RethinkDB PPA.

Upgrading from 1.13 with seamless migration

1.14 is the first RethinkDB release that doesn't require you to migrate your data. Just upgrade the package and restart your RethinkDB processes. Your 1.14 cluster will be ready to go immediately after restarting! This is something people have been asking for since our first release, and we're happy to finally be able to provide it. Making upgrades easier is a big step towards production readiness.

If you have secondary indexes on your data, the web UI may show an issue for those indexes after upgrading. This means that there was a bug fix affecting those indexes, and they need to be recreated to get the new behavior. You can learn how to do that on the troubleshooting page.

Binary data support

Support for storing small chunks of binary data has been one of our most-requested features. Starting with 1.14, you can insert binary data directly with r.binary, and retrieve it like any other part of a row.

Binary data works with everything: it can be stored anywhere in your document structure, and you can index on it like any other data type. (That means you can use binary data as the primary key of a row, or as the value of a secondary index.)

> r.table('users').insert({
    'name': 'Sam Lowry',
    'avatar': r.binary(open('sam_lowry.png', 'rb').read()),
    }).run(conn)
{'replaced': 0, 'inserted': 1, 'skipped': 0, 'deleted': 0, 'unchanged': 0, 'errors': 0}
# In python > 3.0, the 'bytes' type will be used to represent binary data
> r.table('users').filter({'name': 'Sam Lowry'}).run()
{'avatar': b'...', 'name': 'Sam Lowry'}

The r.http command has also gained the ability to return binary data. r.http will try to detect whether it is downloading binary data and return the appropriate type. You can also request that it return binary data with the result_format argument:

> r.table('users').insert({
    'name': 'Jill Layton',
    'avatar': r.http('http://example.com/jill_layton.jpg', result_format='binary')
    }).run(conn)

Binary data is stored inline in your rows, so it's well-suited to storing small images and files, but aren't a good fit for 10GB movies.

Python 3 driver

Thanks to contributions from @grandquista and @barosl, the RethinkDB Python driver has added support for Python 3.0 through 3.4.

Now you can use awesome Python 3 features like yield from with RethinkDB:

def query_twice(reql_query, conn):
    yield from reql_query.run(conn)
    yield from reql_query.run(conn)

Previously, Python 3 support was incomplete because there was no official Protocol Buffers implementation for Python 3. The previous release of RethinkDB added a JSON driver protocol, and Python 3 support was made possible by that work.

Returning changes from queries with multiple writes

This was another much-requested feature. In 1.13 and earlier, we allowed users to return the old and new values of a row when updating a single document. We've changed this interface to be consisted with the changes API and added support for returning changes on any query that does a write.

For example, if you have a table of users where every user has a score:

> r.table('users').run(conn)
[{'id': 'Buttle', 'score': 20},
 {'id': 'Tuttle', 'score': 7},
 ...]

Then you can atomically increment-and-return Buttle and Tuttle's scores like so:

> r.table('users') \
   .get_all('Buttle', 'Tuttle') \
   .update(lambda row: {'score': row['score'] + 1}) \
   .run(conn, return_changes=True)
{'changes':
  [{'new_val': {'id': 'Buttle', 'score': 21},
    'old_val': {'id': 'Buttle', 'score': 20}},
   {'new_val': {'id': 'Tuttle', 'score': 8},
    'old_val': {'id': 'Tuttle', 'score': 7}}],
 'deleted': 0,
 'errors': 0,
 'inserted': 0,
 'replaced': 2,
 'skipped': 0,
 'unchanged': 0}

Improved documentation

@chipotle has been improving our documentation for the last three months. You can see his work in the greatly expanded map-reduce docs, as well as new pages on importing your data, using nested fields, and database limitations.

Overall, there have been hundreds of improvements to the docs since the last release. Excellent docs have always been something we've strived for, and having someone working on them full time ensures they'll always be high quality and up to date.

Handling conflicts on insert

Previously, the insert command supported the upsert optional argument. This allowed you to insert or replace a document. In 1.14 we replaced the upsert argument with the conflict argument, and added the ability to update an existing document, rather than overwrite it completely.

As an example, let's assume you have two web crawlers that get ratings for movies from both IMDB and Rotten Tomatoes. In general, we don't know which crawler will get to a particular movie first. In this case, the IMDB crawler already inserted its document for the movie Brazil:

> r.table('movies').get('Brazil (1985)').run(conn)
{'id': 'Brazil (1985)',
 'imdb_rating': 8.0 }

By default, if the Rotten Tomatoes crawler tries to do an insert with the key "Brazil (1985)", we'll get an error, since the document already exists. But if instead it uses conflict='update', the document will simply be updated:

> r.table('movies').insert({
    'id': 'Brazil (1985)',
    'rt_rating': 98,
    },
    conflict='update').run(conn)
> r.table('movies').get('Brazil (1985)').run(conn)
{'id': 'Brazil (1985)',
 'imdb_rating': 8.0,
 'rt_rating': 98}

You can get the previous upsert behavior with conflict='replace'.

Next steps

See the full list of enhancements, and take the new release for a spin!

The team is already hard at work on the upcoming 1.15 release that will likely include geospatial query support. As always, if there is something you'd like us to prioritize or have any feedback on the release, please let us know!

Help work on the 1.15 release: RethinkDB is hiring.