A tasty RethinkDB video roundup for Thanksgiving

Thanksgiving is almost here: it is time to configure your dinner tables for family clusters and prepare some turkey for batch insertion of stuffing. To show how thankful we are for our amazing community, we put together this tasty video playlist with our best leftovers from October and November. It will help keep you entertained while you try not to succumb to the inevitable post-turkey tryptophan coma. Enjoy!


RethinkDB on FLOSS Weekly

RethinkDB co-founder Slava Akhmechet participated in a recent episode of TWiT's FLOSS Weekly video podcast. The hour-long interview includes a lengthy discussion about RethinkDB's origins, open source values, and suitability for real-time application development. Slava also shared lessons learned during RethinkDB development and talked about some future plans for the project.


Scale up RethinkDB apps on AWS with Docker

Climb Amazon's Elastic Beanstalk with RethinkDB co-founder Michael Glukhovsky in a presentation filmed at this month's Docker meetup. During the 20-minute talk, Michael demonstrated how to deploy a RethinkDB application on AWS with Docker. Learn how Docker containers and RethinkDB changefeeds make it easy to scale real-time apps in the cloud.


Build realtime location-aware apps with RethinkDB

RethinkDB Developer Evangelist Ryan Paul shook up a crowd last month with a short presentation about earthquake mapping. Ryan demonstrated how to use geospatial queries in RethinkDB to plot earthquake data on a map. Ryan also demonstrated how location-aware applications can take advantage of RethinkDB changefeeds to deliver real-time updates.


Pub/Sub made easy with RethinkDB

RethinkDB engineer Josh Kuhn made Gotham's streets a little safer during a five-minute presentation at last month's RethinkDB meetup. He demonstrated how to track comic book superhero match-ups in real-time using repubsub, a lightweight pub/sub library that uses RethinkDB as a message exchange.


RethinkDB hosting webinar with Compose

Our friends at Compose now offer a managed RethinkDB hosting service in the cloud. In a webinar last month, RethinkDB co-founder Slava Akhmechet and Compose CEO Kurt Mackey demonstrated how to use the service and discussed how it works.

CatThink: see the cats of Instagram in realtime with RethinkDB and Socket.io

Modern frameworks and standards make it easy for developers to build web applications that support realtime updates. You can push the latest data to your users, offering a seamless experience that results in higher engagement and better usability. With the right architecture on the backend, you can put polling out to pasture and liberate your users from the tyranny of the refresh button.

In this tutorial, I'll show you how I built a realtime Instagram client for the web. The application, which is called CatThink, displays a live feed of new Instagram pictures that have the #catsofinstagram tag. Why cats of Instagram? Because it's one of the photo service's most popular and beloved tags. People on the internet really, really like cats. Or maybe we just think we do because our feline companions have reprogrammed us with brain parasites.

The cat pictures appear in real time, as they are posted by their respective users. CatThink shows the pictures in a grid, accompanied by captions and other relevant metadata. In a secondary view, the application uses geolocation info to plot the cat pictures on a map.

CatThink's architecture

The CatThink backend is built with Node.js and Express on top of RethinkDB. The HTML frontend uses jQuery and Handlebars to display the latest cat pictures. The frontend map view is built with Leaflet, a popular map library that uses tiles from OpenStreetMap. The application uses Socket.io to facilitate communication between the frontend and backend.

CatThink takes advantage of Instagram's realtime APIs to determine when new images are available. Instagram offers a webhook-based system that allows a backend application to subscribe to updates on a given tag. When there are new posts with the #catsofinstagram tag, Instagram's servers send an HTTP POST request to a callback URL on your server. The POST request doesn't actually include the new content, it just includes a timestamp and the name of the updated tag---your application has to fetch the new records using Instagram's conventional REST API endpoints.

When the CatThink backend receives a POST request from Instagram, it performs a RethinkDB query that uses the r.http command to fetch the latest records from the Instagram REST API and add them directly to the database. The database itself performs the HTTP GET request and parses the returned data.

Because the operation is performed entirely with ReQL, the backend application isn't responsible for fetching or processing any of the new Instagram pictures. Of course, the backend application will still need to know about new cat pictures so that it can send them to the frontend with Socket.io. CatThink accomplishes that with changefeeds, a RethinkDB feature that lets applications subscribe to changes on a table. Whenever the database adds, removes, or changes a document in the table, it will notify subscribed applications.

CatThink subscribes to a changefeed on the table where the cat records are stored. Whenever the database inserts a new cat record, CatThink receives the data through the changefeed and then broadcasts it to all of the Socket.io connections.

Connect to the Instagram realtime API

To use the Instagram API, you will have to register an application key on the Instagram developer site. You will need to use the client ID and client secret provided by Instagram in order to hit the API endpoints. You don't need to configure the key with a redirect URI, however, as you won't be using authentication.

To subscribe to a tag with Instagram's realtime API, make an HTTP POST request to the api.instagram.com/v1/subscriptions/. In the form data attached to the request, you will need to provide the application key data, the name of the tag, a verification token, and the callback URL where you want Instagram to send new data. The verification token is an arbitrary string that Instagram will pass back to your application when it hits the callback URL.

Note: the callback URL that you provide to Instagram must be publicly-accessible to outside networks. For development purposes, it can be helpful to use a tool like ngrok that exposes a local port to the public internet.

In CatThink, I use the request library to perform the initial request to the Instagram server:

var params = {
  client_id: "XXXXXXXXXXXXXXXXXXXXXXXXX",
  client_secret: "XXXXXXXXXXXXXXXXXXXXXXXXX",
  verify_token: "somestring",
  object: "tag", aspect: "media",
  object_id: "catsofinstagram",
  callback_url: "http://mycatapp.ngrok.com/publish/photo"
};

request.post({url: api + "subscriptions", form: params},
  function(err, response, body) {
    if (err) console.log("Failed to subscribe:", err);
    else console.log("Successfully subscribed.");
});

If the subscription API call is properly formed, Instagram will immediately attempt to make an HTTP GET request at the callback URL. It will send several query parameters, including the verification token and a challenge key. In order to complete the subscription, you have to make the GET request return the provided challenge key. With Express, create a GET handler for the callback URL:

app.get("/publish/photo", function(req, res) {
  if (req.param("hub.verify_token") == "somestring")
    res.send(req.param("hub.challenge"));
  else res.status(500).json({err: "Verify token incorrect"});
});

Fetch the latest cats

The next step is to implement the POST handler for the callback URL. When Instagram sends the application a POST request to inform it of new content on the subscribed tag, it includes several bits of information in the request body:

[{
        "changed_aspect": "media",
        "object": "tag",
        "object_id": "catsofinstagram",
        "time": 1414995025,
        "subscription_id": 14185203,
        "data": {}
}]

The object_id property is obviously the name of the subscribed tag. The time property is a UNIX timestamp that reflects when the event occurred. The subscription_id property is a value that uniquely identifies the individual subscription.

Whenever the application receives a POST request at the callback URL, it will tell the database to fetch the latest cat records from Instagram's REST API. The application also provides a response so that Instagram knows that the POST request didn't fail. If the POST requests that Instagram sends to the application start to fail, Instagram will automatically taper off requests and eventually cancel the tag subscription.

app.post("/publish/photo", function(req, res) {
  var update = req.body[0];
  res.json({success: true, kind: update.object});

  if (update.time - lastUpdate < 1) return;
  lastUpdate = update.time;

  var path = "https://api.instagram.com/v1/tags/" +
             "catsofinstagram/media/recent?client_id=" +
             instagramClientId;


  r.connect(config.database).then(function(conn) {
    this.conn = conn;
    return r.table("instacat").insert(
      r.http(path)("data").merge(function(item) {
        return {
          time: r.now(),
          place: r.point(
            item("location")("longitude"),
            item("location")("latitude")).default(null)
        }
      })).run(conn)
  })
  .error(function(err) { console.log("Failure:", err); })
  .finally(function() {
    if (this.conn)
      this.conn.close();
  });
});

In the code above, the ReQL query uses the r.point command in a merge operation to turn the geographical coordinates for each cat photo into a native geolocation point object. That's not used in the application, but it might be useful later if you wanted to create a geospatial index and query for cat pictures based on location.

In order to avoid hitting the Instagram API limit, the application checks the timestamp provided with each POST request and does some basic throttling to ensure that new cat records aren't typically going to be fetched more than once per second.

The path variable in the handler code is the URL of the Instagram REST API endpoint that the application uses to fetch the latest cat. In this example, the "catsofinstagram" tag is hard-coded into the URL path. It's worth noting, however, that you could use the name of the subscribed tag from the object_id property if you wanted to use the same POST handler to deal with multiple tag subscriptions.

Verify the request origin

In cases where you rely on the object_id property, you'd probably also want to validate the source of the POST request to make sure that it actually came from Instagram. If you don't verify the origin, somebody might figure out your URL endpoint and send you malicious POST requests that include an object_id for a rogue tag that you don't want to appear in your application. You wouldn't want some nefarious anti-cat vigilante to trick your application into showing dogs, for example.

Every POST request from Instagram will have an X-Hub-Signature header with a hash that you can validate using your secret key and the request body. The bodyParser middleware provides a verify option that is specifically intended for such purposes:

app.use("/publish/photo", bodyParser.json({
  verify: function(req, res, buf) {
    var hmac = crypto.createHmac("sha1", "XXXXXXXXXXXXXXX");
    var hash = hmac.update(buf).digest("hex");

    if (req.header("X-Hub-Signature") == hash)
      req.validOrigin = true;
  }
}));

At the beginning of your POST handler, you would simply check the value of req.validOrigin and make sure that it's true before continuing.

Use changefeeds to handle new cats

The CatThink backend uses RethinkDB changefeeds to detect when the database adds new records to the cat table. In a ReQL query, the changes command returns a cursor that exposes every modification that is made to the specified table. The following code shows how to consume the data emitted by the changefeed and broadcast each new item with Socket.io:

r.table("instacat").changes().run(this.conn).then(function(cursor) {
  cursor.each(function(err, item) {
    if (item && item.new_val)
      io.sockets.emit("cat", item.new_val);
  });
})
.error(function(err) {
  console.log("Error:", err);
});

CatThink broadcasts every cat to every user, so you don't need to worry about tracking individual Socket.io connections or routing messages to the right users.

In addition to broadcasting new cats, it's also a good idea to pass the user a modest backlog of cats when they first establish their connection with the server so that their initial view of the application is populated with some data. In a Socket.io connection event handler, CatThink performs a ReQL query that fetches the 60 most recent cats and then sends the result set back to the user:

io.sockets.on("connection", function(socket) {
  r.connect(config.database).then(function(conn) {
    this.conn = conn;
    return r.table("instacat").orderBy({index: r.desc("time")})
            .limit(60).run(conn)
  })
  .then(function(cursor) { return cursor.toArray(); })
  .then(function(result) {
    socket.emit("recent", result);
  })
  .error(function(err) { console.log("Failure:", err); })
  .finally(function() {
    if (this.conn)
      this.conn.close();
  });
});

Implement the frontend

The CatThink frontend has a very simple user interface: It displays the grid of cats and the accompanying map view. A full-blown JavaScript MVC framework would likely be overkill, so it uses a pretty light dependency stack. It uses Leaflet for the map, jQuery for the UI logic, and Handlebars templating to generate the markup for each new cat picture.

After some initial setup for the tab switching and map view, the bulk of the frontend code is housed in a single addCat function that applies the template to the cat data, inserts the new markup into the grid, and then creates the location marker for cats with geolocation data:

var map = L.map("map").setView([0, 0], 2);
map.addLayer(L.tileLayer(mapTiles, {attribution: mapAttrib}));

var template = Handlebars.compile($("#cat-template").html());
var markers = [];

function addCat(cat) {
  cat.date = moment.unix(cat.created_time).format("MMM DD, h:mm a");
  $("#cats").prepend(template(cat));

  if (cat.place) {
    var count = markers.unshift(L.marker(L.latLng(
        cat.place.coordinates[1],
        cat.place.coordinates[0])));

    map.addLayer(markers[0]);
    markers[0].bindPopup(
        "<img src=\"" + cat.images.thumbnail.url + "\">",
        {minWidth: 150, minHeight: 150});

    markers[0].openPopup();

    if (count > 100)
      map.removeLayer(markers.pop());
  }
}

The map markers are stored in an array so that the application can easily remove old markers as it adds new ones. The marker cap is set to 100 in the code above, but you could likely raise it considerably if desired. It's important to have some kind of cap, however, because Leaflet can sometimes exhibit odd behavior if you have too many.

The Handlebars template that the application applies to the cat data is embedded in the HTML page itself, using a script tag with a custom type:

<script id="cat-template" type="text/x-handlebars-template">
  <div class="cat">
    <div class="user"></div>
    <div class="meta">
      <div class="time">Posted at </div>
      <div class="caption"></div>
    </div>
    <img class="thumb" src="">
  </div>
</script>

The last piece of the puzzle is implementing Socket.io on the client side. The application needs to establish a Socket.io connection with the server and then provide event handlers for the backlog and new cats. Both handlers will simply use the addCat function shown above.

var socket = io.connect();

socket.on("cat", addCat); 
socket.on("recent", function(data) {
  data.reverse().forEach(addCat);
});

The handler for the "cat" event receives a single cat object, which is immediately passed into the addCat function. The handler for the "recent" event receives an array of cat objects from the server. It reverses the array before adding the cats so that the images will display in reverse-chronological order, consistent with how they are added in real time.

Next steps

Although CatThink is not particularly complex, changefeeds helped to simplify the application and reduce the total amount of necessary code. Without changefeeds, the CatThink backend would have to fetch, parse, and process all of the cat records on its own instead of offloading that work to the database with a simple ReQL query.

In larger realtime applications, changefeeds can potentially offer more profound architectural advantages. You can increase the modularity of your application by decoupling the parts that handle and process data from the parts that convey updates to the frontend. There are also cases where you can use changefeeds to eliminate the need for dedicated message queue systems.

In the current version of RethinkDB, changefeeds offer a useful way to monitor changes on individual tables. In future versions, changefeeds will support a richer set of capabilities. Users will be able to monitor filtered data sets and detect change events on complex aggregations, like a player leader board or realtime moving averages. You can look forward to seeing the first round of new changefeed features in an upcoming release.

Install RethinkDB and try the ten-minute guide to experience the database in action.

For additional information, you can refer to:

Deploying RethinkDB applications with Docker using Dokku

Dokku is a simple application deployment system built on Docker. It gives you a Heroku-like PaaS environment on your own Linux system, enabling you to deploy your applications with git. Dokku automatically configures the proper application runtime environment, installs all of the necessary dependencies, and runs each application in its own isolated container. You can easily run Dokku on your own server or an inexpensive Linux VPS.

The RethinkDB Dokku plugin, created by Stuart Bentley, lets developers create containerized RethinkDB instances for their Dokku-deployed apps. I've found that Dokku is a really convenient way to share my RethinkDB demos while I'm prototyping without having to manually deploy and configure each one. In this short tutorial, I'll show you how you can set up Dokku and install the plugin on a Digital Ocean droplet.

Set up a Digital Ocean droplet

If you want to set up Dokku somewhere other than Digital Ocean, you can use the Dokku project's official install script to get it running on any conventional Ubuntu 14.04 system.

Digital Ocean provides a selection of base images that make it easy to create new droplets that come with specific applications or development stacks. Dokku is among the applications that Digital Ocean supports out of the box. When you create a new droplet, simply select the Dokku image from the Applications tab.

You can configure the droplet with the size, region, and hostname of your choice. Be sure to add an SSH key---it will be used later to identify you when you deploy to the system.

After Digital Ocean finishes creating the new droplet, navigate to the droplet's IP address in your browser. The server will display a Dokku configuration panel. The page will prompt you for a public key and a hostname. The key that you selected during droplet creation will automatically appear in the public key field. In the hostname box, you can either put in a domain or the IP address of the droplet.

If you use an IP address, Dokku will simply assign a unique port to each of your deployed applications. If you configure Dokku with a domain, it will automatically create a virtual host configuration with a subdomain for each application that you deploy. For example, if you set apps.mydomain.com as the hostname, an app called demo1 will be available at demo1.apps.mydomain.com. After you fill in the form, click the Finish Setup button to complete the Dokku configuration.

If you chose to use a domain, you also have to set up corresponding DNS records. In your DNS configuration system, add two A records---one for the domain itself and a wildcard record for the subdomains. Both records should use the IP address of your droplet.

A   apps.mydomain.com     xxx.xxx.xxx.xxx
A   *.apps.mydomain.com   xxx.xxx.xxx.xxx

Install the RethinkDB Dokku plugin

The next step is installing the plugin. Use ssh to log into the droplet as root. After logging into the system, navigate to the Dokku plugin folder:

$ cd /var/lib/dokku/plugins

Inside of the Dokku plugin folder, use the git clone command to obtain the plugin repository and put it in a subdirectory called rethinkdb. When the repository finishes downloading, use the dokku plugins-install command to install the plugin.

$ git clone https://github.com/stuartpb/dokku-rethinkdb-plugin rethinkdb
$ dokku plugins-install

Configure your application for deployment

Before you deploy an application, you will need to use Dokku to set up a linked RethinkDB container. While you are logged into the droplet as root, use the following command to set up a new RethinkDB instance:

$ dokku rethinkdb:create myapp

You can replace myapp with the name that you want to use for your application. When you deploy an application, Dokku will automatically link it with the RethinkDB container that has the same name. Now that you have created a RethinkDB container, it is time to deploy your first application.

Dokku supports a number of different programming languages and development stacks. It uses certain files in the project root directory to determine what dependencies to install and how to run the application. For a Ruby demo that I built with Sinatra, all I needed was a Gemfile and a config.ru. For a node.js application built with Express, I used a package.json that included the dependencies and a start script.

You can also optionally use a Heroku-style Procfile to specify how to start the app. Dokku is largely compatible with Heroku, so you can refer to the Heroku docs to see what you need to do for other programming language stacks.

In the source code for your application, you will need to specify the host and port of the RethinkDB instance in the linked container. The RethinkDB Dokku plugin exposes those through environment variables called RDB_HOST and RDB_PORT. In my Ruby application, for example, I used the following code to connect to the database:

DBHOST = ENV["RDB_HOST"] || "localhost"
DBPORT = ENV["RDB_PORT"] || 28015

conn = r.connect :host => DBHOST, :port => DBPORT
...

After you finish configuring your application so that it will run in Dokku, be sure to commit your changes to your local git repository. To deploy the application, you will need to create a new remote:

$ git remote add dokku dokku@apps.mydomain.com:myapp

In the example above, use the domain or IP address of the droplet. Replace the word myapp with the name of your application. The name should match the one that you used when you created the RethinkDB container earlier.

Deploy your application

When you are ready to deploy the application, simply push to dokku:

$ git push dokku master

When you push the application, Dokku will automatically create a new container for it on the droplet, install the necessary dependencies, and start running the application. After the deployment process is complete, you will see the address in your output. If you used an IP address, it will just be the IP and port. If you used a domain, it will be a subdomain like myapp.apps.mydomain.com. Visit the site in a web browser to see if it worked correctly.

If your application didn't start correctly, you can log into the droplet to troubleshoot. Use the following command to see the logs emitted by the deploy process:

$ dokku logs myapp

Replace myapp with the name that you used for your application. That command will show you the log output, which should help you determine if there were any errors. If you want to delete the deployed application, perform the following command:

$ dokku delete myapp

You can type dokku help to see the full list of available commands. I also recommend looking at the advanced usage examples for the RethinkDB Dokku plugin to learn about other capabilities that it provides. You can, for example, expose the web console for a specific containerized RethinkDB instance through a public port on the host.

Although the initial setup process is a little bit involved, Dokku makes it extremely easy to deploy and run your RethinkDB applications. Be sure to check out our example projects if you are looking for a sample RethinkDB application to try deploying with Dokku.

For additional information about using Dokku with RethinkDB, check out:

Upcoming RethinkDB events for October and November

Join the RethinkDB team at the following upcoming events:

RethinkDB at HTML5DevConf

October 21-22, Moscone Center

RethinkDB will be in San Francisco next week for HTML5DevConf, a popular event for frontend web developers. Conference attendees will be able to find us at table 29 in the conference expo hall. You can see our latest demo apps and meet RethinkDB co-founder, Slava Akhmechet. We will also have some fun RethinkDB goodies on hand to give away, including shirts and stickers.

Webinar with Compose

Wednesday, October 22 at 1:30PM PST

Our friends at Compose recently introduced a new service that provides managed RethinkDB hosting in the cloud. They have published several guides to help new users get started with the service. If you would like to learn more, be sure to catch our joint webinar with Compose next week. The live video event will feature Slava Akhmechet and Compose co-founder Kurt Mackey.

RSVP Here »

RethinkDB at DevCon5

November 18-19, San Jose Convention Center

RethinkDB will be at the HTML5 Communications Summit next month in San Jose. Slava will present a talk about real-time web application development with RethinkDB. We will also have a booth where you can see RethinkDB demos, meet members of the team, and get some nice RethinkDB goodies to bring home.

Move Fast and Break Things meetup

Wednesday, November 19 at 6:30 PM, Heavybit Industries, 325 Ninth Street (map)

RethinkDB will give a presentation for the Move Fast and Break Things meetup group in San Francisco. Learn how the RethinkDB team works, including details about the tools and collaborative processes that we use to deliver new RethinkDB releases. More details about the event will be available as it approaches.

RSVP Here »

Hosted RethinkDB deployments in the cloud now available from Compose

We are pleased to announce that our friends at Compose now offer RethinkDB hosting in the cloud. Their new service lets you get a managed RethinkDB deployment in a matter of seconds, providing a fast and easy way to start working on your RethinkDB project without the overhead of managing your own infrastructure or provisioning your own cluster.

Compose, formerly known as MongoHQ, is a dedicated Database as a Service (DBaaS) company. RethinkDB is the third database in their product lineup, launching alongside their existing support for MongoDB and Elasticsearch. Available today as a public beta, their hosted RethinkDB deployments come with automatic scaling and backups.

Each deployment provided by Compose is configured as a high-availability cluster with full redundancy. Their elastic provisioning service manages the entire environment, scaling deployments as needed to accommodate user workloads. Pricing starts at $45 per month for a three-node cluster with 2GB of storage capacity.

Migrate data from a MongoDB deployment

In addition to elastic scaling, Compose also offers a data migration system called a Transporter. If you have data in an existing MongoDB deployment managed by Compose, you can seamlessly import it into a RethinkDB deployment.

The import can be a one-time event or maintained on an ongoing basis with continuous updates—regularly pulling the latest changes into RethinkDB from your MongoDB deployment. If you have an existing MongoDB application that you would like to consider migrating to RethinkDB, Compose makes it really easy to get started.

Get started with Compose

To create a hosted RethinkDB instance, click the Add Deployment button in the Compose admin panel and select RethinkDB. Simply enter a name for the deployment—Compose handles the rest. You will need to input billing information for your Compose account if you have not done so previously.

Each RethinkDB deployment hosted by Compose has its own private network. Compose uses SSH tunneling to provide secure access to a hosted cluster. When you create a RethinkDB deployment in the Compose admin console, it will give you the host and port information that you need to connect.

Once you set up the SSH tunnel on your client system, you can work with the hosted RethinkDB instance in much the same way you would work with a local installation of the database. Even the RethinkDB admin console and Data Explorer operate as expected.

Building your next application with RethinkDB couldn't be easier. Register an account at Compose.io and get started right away. For more details: