CatThink: see the cats of Instagram in realtime with RethinkDB and Socket.io

Modern frameworks and standards make it easy for developers to build web applications that support realtime updates. You can push the latest data to your users, offering a seamless experience that results in higher engagement and better usability. With the right architecture on the backend, you can put polling out to pasture and liberate your users from the tyranny of the refresh button.

In this tutorial, I'll show you how I built a realtime Instagram client for the web. The application, which is called CatThink, displays a live feed of new Instagram pictures that have the #catsofinstagram tag. Why cats of Instagram? Because it's one of the photo service's most popular and beloved tags. People on the internet really, really like cats. Or maybe we just think we do because our feline companions have reprogrammed us with brain parasites.

The cat pictures appear in real time, as they are posted by their respective users. CatThink shows the pictures in a grid, accompanied by captions and other relevant metadata. In a secondary view, the application uses geolocation info to plot the cat pictures on a map.

CatThink's architecture

The CatThink backend is built with Node.js and Express on top of RethinkDB. The HTML frontend uses jQuery and Handlebars to display the latest cat pictures. The frontend map view is built with Leaflet, a popular map library that uses tiles from OpenStreetMap. The application uses Socket.io to facilitate communication between the frontend and backend.

CatThink takes advantage of Instagram's realtime APIs to determine when new images are available. Instagram offers a webhook-based system that allows a backend application to subscribe to updates on a given tag. When there are new posts with the #catsofinstagram tag, Instagram's servers send an HTTP POST request to a callback URL on your server. The POST request doesn't actually include the new content, it just includes a timestamp and the name of the updated tag---your application has to fetch the new records using Instagram's conventional REST API endpoints.

When the CatThink backend receives a POST request from Instagram, it performs a RethinkDB query that uses the r.http command to fetch the latest records from the Instagram REST API and add them directly to the database. The database itself performs the HTTP GET request and parses the returned data.

Because the operation is performed entirely with ReQL, the backend application isn't responsible for fetching or processing any of the new Instagram pictures. Of course, the backend application will still need to know about new cat pictures so that it can send them to the frontend with Socket.io. CatThink accomplishes that with changefeeds, a RethinkDB feature that lets applications subscribe to changes on a table. Whenever the database adds, removes, or changes a document in the table, it will notify subscribed applications.

CatThink subscribes to a changefeed on the table where the cat records are stored. Whenever the database inserts a new cat record, CatThink receives the data through the changefeed and then broadcasts it to all of the Socket.io connections.

Connect to the Instagram realtime API

To use the Instagram API, you will have to register an application key on the Instagram developer site. You will need to use the client ID and client secret provided by Instagram in order to hit the API endpoints. You don't need to configure the key with a redirect URI, however, as you won't be using authentication.

To subscribe to a tag with Instagram's realtime API, make an HTTP POST request to the api.instagram.com/v1/subscriptions/. In the form data attached to the request, you will need to provide the application key data, the name of the tag, a verification token, and the callback URL where you want Instagram to send new data. The verification token is an arbitrary string that Instagram will pass back to your application when it hits the callback URL.

Note: the callback URL that you provide to Instagram must be publicly-accessible to outside networks. For development purposes, it can be helpful to use a tool like ngrok that exposes a local port to the public internet.

In CatThink, I use the request library to perform the initial request to the Instagram server:

var params = {
  client_id: "XXXXXXXXXXXXXXXXXXXXXXXXX",
  client_secret: "XXXXXXXXXXXXXXXXXXXXXXXXX",
  verify_token: "somestring",
  object: "tag", aspect: "media",
  object_id: "catsofinstagram",
  callback_url: "http://mycatapp.ngrok.com/publish/photo"
};

request.post({url: api + "subscriptions", form: params},
  function(err, response, body) {
    if (err) console.log("Failed to subscribe:", err);
    else console.log("Successfully subscribed.");
});

If the subscription API call is properly formed, Instagram will immediately attempt to make an HTTP GET request at the callback URL. It will send several query parameters, including the verification token and a challenge key. In order to complete the subscription, you have to make the GET request return the provided challenge key. With Express, create a GET handler for the callback URL:

app.get("/publish/photo", function(req, res) {
  if (req.param("hub.verify_token") == "somestring")
    res.send(req.param("hub.challenge"));
  else res.status(500).json({err: "Verify token incorrect"});
});

Fetch the latest cats

The next step is to implement the POST handler for the callback URL. When Instagram sends the application a POST request to inform it of new content on the subscribed tag, it includes several bits of information in the request body:

[{
        "changed_aspect": "media",
        "object": "tag",
        "object_id": "catsofinstagram",
        "time": 1414995025,
        "subscription_id": 14185203,
        "data": {}
}]

The object_id property is obviously the name of the subscribed tag. The time property is a UNIX timestamp that reflects when the event occurred. The subscription_id property is a value that uniquely identifies the individual subscription.

Whenever the application receives a POST request at the callback URL, it will tell the database to fetch the latest cat records from Instagram's REST API. The application also provides a response so that Instagram knows that the POST request didn't fail. If the POST requests that Instagram sends to the application start to fail, Instagram will automatically taper off requests and eventually cancel the tag subscription.

app.post("/publish/photo", function(req, res) {
  var update = req.body[0];
  res.json({success: true, kind: update.object});

  if (update.time - lastUpdate < 1) return;
  lastUpdate = update.time;

  var path = "https://api.instagram.com/v1/tags/" +
             "catsofinstagram/media/recent?client_id=" +
             instagramClientId;


  r.connect(config.database).then(function(conn) {
    this.conn = conn;
    return r.table("instacat").insert(
      r.http(path)("data").merge(function(item) {
        return {
          time: r.now(),
          place: r.point(
            item("location")("longitude"),
            item("location")("latitude")).default(null)
        }
      })).run(conn)
  })
  .error(function(err) { console.log("Failure:", err); })
  .finally(function() {
    if (this.conn)
      this.conn.close();
  });
});

In the code above, the ReQL query uses the r.point command in a merge operation to turn the geographical coordinates for each cat photo into a native geolocation point object. That's not used in the application, but it might be useful later if you wanted to create a geospatial index and query for cat pictures based on location.

In order to avoid hitting the Instagram API limit, the application checks the timestamp provided with each POST request and does some basic throttling to ensure that new cat records aren't typically going to be fetched more than once per second.

The path variable in the handler code is the URL of the Instagram REST API endpoint that the application uses to fetch the latest cat. In this example, the "catsofinstagram" tag is hard-coded into the URL path. It's worth noting, however, that you could use the name of the subscribed tag from the object_id property if you wanted to use the same POST handler to deal with multiple tag subscriptions.

Verify the request origin

In cases where you rely on the object_id property, you'd probably also want to validate the source of the POST request to make sure that it actually came from Instagram. If you don't verify the origin, somebody might figure out your URL endpoint and send you malicious POST requests that include an object_id for a rogue tag that you don't want to appear in your application. You wouldn't want some nefarious anti-cat vigilante to trick your application into showing dogs, for example.

Every POST request from Instagram will have an X-Hub-Signature header with a hash that you can validate using your secret key and the request body. The bodyParser middleware provides a verify option that is specifically intended for such purposes:

app.use("/publish/photo", bodyParser.json({
  verify: function(req, res, buf) {
    var hmac = crypto.createHmac("sha1", "XXXXXXXXXXXXXXX");
    var hash = hmac.update(buf).digest("hex");

    if (req.header("X-Hub-Signature") == hash)
      req.validOrigin = true;
  }
}));

At the beginning of your POST handler, you would simply check the value of req.validOrigin and make sure that it's true before continuing.

Use changefeeds to handle new cats

The CatThink backend uses RethinkDB changefeeds to detect when the database adds new records to the cat table. In a ReQL query, the changes command returns a cursor that exposes every modification that is made to the specified table. The following code shows how to consume the data emitted by the changefeed and broadcast each new item with Socket.io:

r.table("instacat").changes().run(this.conn).then(function(cursor) {
  cursor.each(function(err, item) {
    if (item && item.new_val)
      io.sockets.emit("cat", item.new_val);
  });
})
.error(function(err) {
  console.log("Error:", err);
});

CatThink broadcasts every cat to every user, so you don't need to worry about tracking individual Socket.io connections or routing messages to the right users.

In addition to broadcasting new cats, it's also a good idea to pass the user a modest backlog of cats when they first establish their connection with the server so that their initial view of the application is populated with some data. In a Socket.io connection event handler, CatThink performs a ReQL query that fetches the 60 most recent cats and then sends the result set back to the user:

io.sockets.on("connection", function(socket) {
  r.connect(config.database).then(function(conn) {
    this.conn = conn;
    return r.table("instacat").orderBy({index: r.desc("time")})
            .limit(60).run(conn)
  })
  .then(function(cursor) { return cursor.toArray(); })
  .then(function(result) {
    socket.emit("recent", result);
  })
  .error(function(err) { console.log("Failure:", err); })
  .finally(function() {
    if (this.conn)
      this.conn.close();
  });
});

Implement the frontend

The CatThink frontend has a very simple user interface: It displays the grid of cats and the accompanying map view. A full-blown JavaScript MVC framework would likely be overkill, so it uses a pretty light dependency stack. It uses Leaflet for the map, jQuery for the UI logic, and Handlebars templating to generate the markup for each new cat picture.

After some initial setup for the tab switching and map view, the bulk of the frontend code is housed in a single addCat function that applies the template to the cat data, inserts the new markup into the grid, and then creates the location marker for cats with geolocation data:

var map = L.map("map").setView([0, 0], 2);
map.addLayer(L.tileLayer(mapTiles, {attribution: mapAttrib}));

var template = Handlebars.compile($("#cat-template").html());
var markers = [];

function addCat(cat) {
  cat.date = moment.unix(cat.created_time).format("MMM DD, h:mm a");
  $("#cats").prepend(template(cat));

  if (cat.place) {
    var count = markers.unshift(L.marker(L.latLng(
        cat.place.coordinates[1],
        cat.place.coordinates[0])));

    map.addLayer(markers[0]);
    markers[0].bindPopup(
        "<img src=\"" + cat.images.thumbnail.url + "\">",
        {minWidth: 150, minHeight: 150});

    markers[0].openPopup();

    if (count > 100)
      map.removeLayer(markers.pop());
  }
}

The map markers are stored in an array so that the application can easily remove old markers as it adds new ones. The marker cap is set to 100 in the code above, but you could likely raise it considerably if desired. It's important to have some kind of cap, however, because Leaflet can sometimes exhibit odd behavior if you have too many.

The Handlebars template that the application applies to the cat data is embedded in the HTML page itself, using a script tag with a custom type:

<script id="cat-template" type="text/x-handlebars-template">
  <div class="cat">
    <div class="user"></div>
    <div class="meta">
      <div class="time">Posted at </div>
      <div class="caption"></div>
    </div>
    <img class="thumb" src="">
  </div>
</script>

The last piece of the puzzle is implementing Socket.io on the client side. The application needs to establish a Socket.io connection with the server and then provide event handlers for the backlog and new cats. Both handlers will simply use the addCat function shown above.

var socket = io.connect();

socket.on("cat", addCat); 
socket.on("recent", function(data) {
  data.reverse().forEach(addCat);
});

The handler for the "cat" event receives a single cat object, which is immediately passed into the addCat function. The handler for the "recent" event receives an array of cat objects from the server. It reverses the array before adding the cats so that the images will display in reverse-chronological order, consistent with how they are added in real time.

Next steps

Although CatThink is not particularly complex, changefeeds helped to simplify the application and reduce the total amount of necessary code. Without changefeeds, the CatThink backend would have to fetch, parse, and process all of the cat records on its own instead of offloading that work to the database with a simple ReQL query.

In larger realtime applications, changefeeds can potentially offer more profound architectural advantages. You can increase the modularity of your application by decoupling the parts that handle and process data from the parts that convey updates to the frontend. There are also cases where you can use changefeeds to eliminate the need for dedicated message queue systems.

In the current version of RethinkDB, changefeeds offer a useful way to monitor changes on individual tables. In future versions, changefeeds will support a richer set of capabilities. Users will be able to monitor filtered data sets and detect change events on complex aggregations, like a player leader board or realtime moving averages. You can look forward to seeing the first round of new changefeed features in an upcoming release.

Install RethinkDB and try the ten-minute guide to experience the database in action.

For additional information, you can refer to:

Deploying RethinkDB applications with Docker using Dokku

Dokku is a simple application deployment system built on Docker. It gives you a Heroku-like PaaS environment on your own Linux system, enabling you to deploy your applications with git. Dokku automatically configures the proper application runtime environment, installs all of the necessary dependencies, and runs each application in its own isolated container. You can easily run Dokku on your own server or an inexpensive Linux VPS.

The RethinkDB Dokku plugin, created by Stuart Bentley, lets developers create containerized RethinkDB instances for their Dokku-deployed apps. I've found that Dokku is a really convenient way to share my RethinkDB demos while I'm prototyping without having to manually deploy and configure each one. In this short tutorial, I'll show you how you can set up Dokku and install the plugin on a Digital Ocean droplet.

Set up a Digital Ocean droplet

If you want to set up Dokku somewhere other than Digital Ocean, you can use the Dokku project's official install script to get it running on any conventional Ubuntu 14.04 system.

Digital Ocean provides a selection of base images that make it easy to create new droplets that come with specific applications or development stacks. Dokku is among the applications that Digital Ocean supports out of the box. When you create a new droplet, simply select the Dokku image from the Applications tab.

You can configure the droplet with the size, region, and hostname of your choice. Be sure to add an SSH key---it will be used later to identify you when you deploy to the system.

After Digital Ocean finishes creating the new droplet, navigate to the droplet's IP address in your browser. The server will display a Dokku configuration panel. The page will prompt you for a public key and a hostname. The key that you selected during droplet creation will automatically appear in the public key field. In the hostname box, you can either put in a domain or the IP address of the droplet.

If you use an IP address, Dokku will simply assign a unique port to each of your deployed applications. If you configure Dokku with a domain, it will automatically create a virtual host configuration with a subdomain for each application that you deploy. For example, if you set apps.mydomain.com as the hostname, an app called demo1 will be available at demo1.apps.mydomain.com. After you fill in the form, click the Finish Setup button to complete the Dokku configuration.

If you chose to use a domain, you also have to set up corresponding DNS records. In your DNS configuration system, add two A records---one for the domain itself and a wildcard record for the subdomains. Both records should use the IP address of your droplet.

A   apps.mydomain.com     xxx.xxx.xxx.xxx
A   *.apps.mydomain.com   xxx.xxx.xxx.xxx

Install the RethinkDB Dokku plugin

The next step is installing the plugin. Use ssh to log into the droplet as root. After logging into the system, navigate to the Dokku plugin folder:

$ cd /var/lib/dokku/plugins

Inside of the Dokku plugin folder, use the git clone command to obtain the plugin repository and put it in a subdirectory called rethinkdb. When the repository finishes downloading, use the dokku plugins-install command to install the plugin.

$ git clone https://github.com/stuartpb/dokku-rethinkdb-plugin rethinkdb
$ dokku plugins-install

Configure your application for deployment

Before you deploy an application, you will need to use Dokku to set up a linked RethinkDB container. While you are logged into the droplet as root, use the following command to set up a new RethinkDB instance:

$ dokku rethinkdb:create myapp

You can replace myapp with the name that you want to use for your application. When you deploy an application, Dokku will automatically link it with the RethinkDB container that has the same name. Now that you have created a RethinkDB container, it is time to deploy your first application.

Dokku supports a number of different programming languages and development stacks. It uses certain files in the project root directory to determine what dependencies to install and how to run the application. For a Ruby demo that I built with Sinatra, all I needed was a Gemfile and a config.ru. For a node.js application built with Express, I used a package.json that included the dependencies and a start script.

You can also optionally use a Heroku-style Procfile to specify how to start the app. Dokku is largely compatible with Heroku, so you can refer to the Heroku docs to see what you need to do for other programming language stacks.

In the source code for your application, you will need to specify the host and port of the RethinkDB instance in the linked container. The RethinkDB Dokku plugin exposes those through environment variables called RDB_HOST and RDB_PORT. In my Ruby application, for example, I used the following code to connect to the database:

DBHOST = ENV["RDB_HOST"] || "localhost"
DBPORT = ENV["RDB_PORT"] || 28015

conn = r.connect :host => DBHOST, :port => DBPORT
...

After you finish configuring your application so that it will run in Dokku, be sure to commit your changes to your local git repository. To deploy the application, you will need to create a new remote:

$ git remote add dokku dokku@apps.mydomain.com:myapp

In the example above, use the domain or IP address of the droplet. Replace the word myapp with the name of your application. The name should match the one that you used when you created the RethinkDB container earlier.

Deploy your application

When you are ready to deploy the application, simply push to dokku:

$ git push dokku master

When you push the application, Dokku will automatically create a new container for it on the droplet, install the necessary dependencies, and start running the application. After the deployment process is complete, you will see the address in your output. If you used an IP address, it will just be the IP and port. If you used a domain, it will be a subdomain like myapp.apps.mydomain.com. Visit the site in a web browser to see if it worked correctly.

If your application didn't start correctly, you can log into the droplet to troubleshoot. Use the following command to see the logs emitted by the deploy process:

$ dokku logs myapp

Replace myapp with the name that you used for your application. That command will show you the log output, which should help you determine if there were any errors. If you want to delete the deployed application, perform the following command:

$ dokku delete myapp

You can type dokku help to see the full list of available commands. I also recommend looking at the advanced usage examples for the RethinkDB Dokku plugin to learn about other capabilities that it provides. You can, for example, expose the web console for a specific containerized RethinkDB instance through a public port on the host.

Although the initial setup process is a little bit involved, Dokku makes it extremely easy to deploy and run your RethinkDB applications. Be sure to check out our example projects if you are looking for a sample RethinkDB application to try deploying with Dokku.

For additional information about using Dokku with RethinkDB, check out:

Upcoming RethinkDB events for October and November

Join the RethinkDB team at the following upcoming events:

RethinkDB at HTML5DevConf

October 21-22, Moscone Center

RethinkDB will be in San Francisco next week for HTML5DevConf, a popular event for frontend web developers. Conference attendees will be able to find us at table 29 in the conference expo hall. You can see our latest demo apps and meet RethinkDB co-founder, Slava Akhmechet. We will also have some fun RethinkDB goodies on hand to give away, including shirts and stickers.

Webinar with Compose

Wednesday, October 22 at 1:30PM PST

Our friends at Compose recently introduced a new service that provides managed RethinkDB hosting in the cloud. They have published several guides to help new users get started with the service. If you would like to learn more, be sure to catch our joint webinar with Compose next week. The live video event will feature Slava Akhmechet and Compose co-founder Kurt Mackey.

RSVP Here »

RethinkDB at DevCon5

November 18-19, San Jose Convention Center

RethinkDB will be at the HTML5 Communications Summit next month in San Jose. Slava will present a talk about real-time web application development with RethinkDB. We will also have a booth where you can see RethinkDB demos, meet members of the team, and get some nice RethinkDB goodies to bring home.

Move Fast and Break Things meetup

Wednesday, November 19 at 6:30 PM, Heavybit Industries, 325 Ninth Street (map)

RethinkDB will give a presentation for the Move Fast and Break Things meetup group in San Francisco. Learn how the RethinkDB team works, including details about the tools and collaborative processes that we use to deliver new RethinkDB releases. More details about the event will be available as it approaches.

RSVP Here »

Hosted RethinkDB deployments in the cloud now available from Compose

We are pleased to announce that our friends at Compose now offer RethinkDB hosting in the cloud. Their new service lets you get a managed RethinkDB deployment in a matter of seconds, providing a fast and easy way to start working on your RethinkDB project without the overhead of managing your own infrastructure or provisioning your own cluster.

Compose, formerly known as MongoHQ, is a dedicated Database as a Service (DBaaS) company. RethinkDB is the third database in their product lineup, launching alongside their existing support for MongoDB and Elasticsearch. Available today as a public beta, their hosted RethinkDB deployments come with automatic scaling and backups.

Each deployment provided by Compose is configured as a high-availability cluster with full redundancy. Their elastic provisioning service manages the entire environment, scaling deployments as needed to accommodate user workloads. Pricing starts at $45 per month for a three-node cluster with 2GB of storage capacity.

Migrate data from a MongoDB deployment

In addition to elastic scaling, Compose also offers a data migration system called a Transporter. If you have data in an existing MongoDB deployment managed by Compose, you can seamlessly import it into a RethinkDB deployment.

The import can be a one-time event or maintained on an ongoing basis with continuous updates—regularly pulling the latest changes into RethinkDB from your MongoDB deployment. If you have an existing MongoDB application that you would like to consider migrating to RethinkDB, Compose makes it really easy to get started.

Get started with Compose

To create a hosted RethinkDB instance, click the Add Deployment button in the Compose admin panel and select RethinkDB. Simply enter a name for the deployment—Compose handles the rest. You will need to input billing information for your Compose account if you have not done so previously.

Each RethinkDB deployment hosted by Compose has its own private network. Compose uses SSH tunneling to provide secure access to a hosted cluster. When you create a RethinkDB deployment in the Compose admin console, it will give you the host and port information that you need to connect.

Once you set up the SSH tunnel on your client system, you can work with the hosted RethinkDB instance in much the same way you would work with a local installation of the database. Even the RethinkDB admin console and Data Explorer operate as expected.

Building your next application with RethinkDB couldn't be easier. Register an account at Compose.io and get started right away. For more details:

BeerThink: infinite scrolling in a mobile app with Ionic, Node.js, and RethinkDB

Developers often use pagination to display large collections of data. An application can fetch content in batches as needed, presenting a fixed number of records at a time. On the frontend, paginated user interfaces typically provide something like "next" and "previous" navigation buttons so that users can move through the data set. In modern mobile apps, it is increasingly common to implement an infinite scrolling user interface on top of paginated data. As the user scrolls through a list, the application fetches and appends new records.

To demonstrate the use of pagination in RethinkDB applications, I made a simple mobile app called BeerThink. It displays a list of beers and breweries, providing a detailed summary when the user taps an item. The app uses a data dump from the Open Beer Database, which contains information about roughly 4,400 beers and 1,200 breweries. I converted the data to JSON so that it is easy to import into RethinkDB. There are two tables, one for beers and one for breweries. The application uses RethinkDB's support for table joins to correlate the beers with their respective breweries.

BeerThink's backend is built with Node.js and Express. It exposes beer and brewery data retrieved from a RethinkDB database, providing a paginated API that returns 50 records at a time.

The BeerThink frontend is built with Ionic, a popular AngularJS-based JavaScript framework designed for mobile web apps. BeerThink uses an infinite scrolling list to present the beers in alphabetical order.

BeerThink's architecture aligns with the API-first approach used by many modern mobile web applications. The backend is solely an API layer, completely decoupled from the frontend. The frontend is a single-page web application designed to consumes the backend API. This particular approach makes it easy to build multiple frontend experiences on top of the same backend. You could, for example, easily make native desktop and mobile applications that consume the same backend API.

This tutorial demonstrates how BeerThink's pagination works at each layer of the stack: the RethinkDB database, the Node backend, and the Ionic client application.

Efficient pagination in RethinkDB

If you'd like to follow along and try the pagination queries yourself, create a table and then use the r.http command to add the beer list to a database:

r.table("beers").insert(r.http("https://raw.githubusercontent.com/rethinkdb/beerthink/master/data/beers.json", {result_format: "json"}))

To efficiently alphabetize and paginate the beer list, you first need to create an index on the name property:

r.table("beers").indexCreate("name")

After creating the index, you can use it in the orderBy command to fetch an alphabetized list of names:

r.table("beers").orderBy({index: "name"})

When paginating records from a database, you want to be able to obtain a subset of ordered table records. In a conventional SQL environment, you might accomplish that by using OFFSET and LIMIT. RethinkDB's skip and limit commands are serviceable equivalents, but the skip command doesn't offer optimal performance.

The between command, which is commonly used to fetch all documents that are between two keys in a table, is a much more efficient way to get the start position of a table subset. You can optionally specify a secondary index when using the between command, which means that it can operate on the indexed name property of the beers table.

The following example shows how to use the between command on the name index to get all of the beers between "Petrus Speciale" and "Plank Road Pale Ale" in alphabetical order:

r.table("beers")
  .between("Petrus Speciale", "Plank Road Pale Ale", {index: "name"})
  .orderBy({index: "name"})

When the BeerThink application starts, it uses orderBy and limit to fetch the first page of data. To get subsequent pages, it uses the between and limit commands. The value that the program supplies for the between command's start position is simply the index of the very last item that was fetched on the previous page.

r.table("beers")
  .between("Petrus Speciale", null, {leftBound: "open", index: "name"})
  .orderBy({index: "name"}).limit(50)

The example above shows how to fetch 50 records, starting from a particular beer. Because the program doesn't actually know what beer will be at the end of the new page of data, the between command is given null as its closing index value. That will cause the between command to return everything from the start index to the end of the table. The query uses the limit command to get only the desired number of records.

Setting the value of the leftBound option to open tells the between command to omit the first record, the one that we use to define the start index. That's useful because the item is one that you already have at the end of your list---you don't want to add it again.

The slice command

The between command is a good way to implement pagination in many cases, but it isn't universally applicable. There are cases where you won't have the last item of the previous page to use as a starting point.

Consider a situation where you want the user to be able to visit an arbitrary page without first iterating through the entire set. You might, for example, want to build a web application that accepts an arbitrary page number as a URL path segment and returns the relevant results. In such cases, the best approach is to use the slice command.

The slice command takes a start index and an end index. To get 50 records that are 3000 records down from the top of the table, simply pass 3000 and 3050 as the parameters:

r.table("beers").orderBy({index: "name"}).slice(3000, 3050)

When the user requests an arbitrary page, you simply multiply by the number of items per page to determine the slice command's start and end positions:

query.slice((pageNumber - 1) * perPage, pageNumber * perPage)

In the example above, use the desired values for pageNumber and perPage. Although the slice command isn't as fast as using between and limit, it is still much more efficient than using the skip command.

Pagination in BeerThink's API backend

The BeerThink backend is built with Node and Express. It provides simple API endpoints that are consumed by the frontend client application. The /beers endpoint provides the list of beers, 50 records at a time. The application also has a /breweries endpoint that similarly displays a list of beers.

For pagination, the user can optionally pass a last URL query parameter with the name of the most recently-fetched item. Both API endpoints support the same pagination mechanism. Taking advantage of the ReQL query language's composability, I generalized the operation that I use for pagination into a function that I can apply to any table index:

function paginate(table, index, limit, last) {
  return (!last ? table : table
    .between(last, null, {leftBound: "open", index: index}))
  .orderBy({index: index}).limit(limit)
}

The table parameter takes a RethinkDB expression that references a table. The index parameter is the name of the table index on which to operate. The limit parameter is the total number of desired items. The last parameter is the item to use to find the start of the page. If the last parameter is null or undefined, the application will fetch the first page of data instead of applying the between command.

In the /breweries endpoint, apply the paginate function to the breweries table. Use the req.param method provided by Express to get the URL query parameter that has the value of the last list item. If the user didn't provide the URL query parameter, the value will be undefined. All you have to do is run the query and give the user the JSON results:

app.get("/breweries", function(req, res) {
  var last = req.param("last");

  paginate(r.table("breweries"), "name", 50, last).run(req.db)
  .then(function(cursor) { return cursor.toArray(); })
  .then(function(output) { res.json(output); })
  .error(function(err) {
    res.status(500).json({error: err});
  });
});

The /beers endpoint is implemented the exact same way as the /breweries endpoint, using the same paginate function that I defined above. The query is a little more complex, however, because it has to use an eqJoin operation to get the brewery for each beer:

app.get("/beers", function(req, res) {
  var last = req.param("last");

  paginate(r.table("beers"), "name", 50, last)
    .eqJoin("brewery_id", r.table("breweries"))
    .map(function(item) {
      return item("left").merge({"brewery": item("right")})
    }).without("brewery_id").run(req.db)
  .then(function(cursor) { return cursor.toArray(); })
  .then(function(output) { res.json(output); })
  .error(function(err) {
    res.status(500).json({error: err});
  });
});

Even though the two endpoints used different queries, the same pagination function worked well on both. Abstracting common ReQL patterns into reusable functions can greatly simplify your code. If you wanted to make it possible for the client specify how many records are returned for each page, you could easily achieve that by taking another request variable and passing it to the paginate function as the value of the limit parameter.

Slice-style pagination on the backend

Although the between command is the best approach to use for pagination in the BeerThink application, the slice command is also easy to implement on the backend. I've included a brief explanation here for those who would like to see an example.

When you define a URL handler in Express, you can use a colon to signify that a particular URL segment is a variable. If you define the breweries endpoint as /breweries/:page, the page number passed by the user in the URL segment will be assigned to the request's page parameter.

In the handler for the endpoint, use parseInt or a plus sign to coerce the page number into an integer that can be passed into the ReQL query. Next, use the orderBy command to alphabetize the breweries. Finally, use the slice command with the page number and item count to fetch the desired subset of items.

app.get("/breweries/:page", function(req, res) {
  var pageNum = parseInt(req.params.page) || 1;

  r.table("breweries").orderBy({index: "name"})
    .slice((pageNum - 1) * 50, pageNum * 50).run(req.db)
  .then(function(cursor) { return cursor.toArray(); })
  .then(function(output) { res.json(output); })
  .error(function(err) {
    res.status(500).json({error: err});
  });
});

If the user browses to /breweries/3, the application will give them the third page of brewery data formatted in JSON. In the example above, you might notice that the code assigns a default value of 1 to the pageNum variable if a page number wasn't provided with the request. That makes it so visiting /breweries by itself, without a page URL segment, will return the first page of data.

Consuming the paginated API in Ionic

Now that the endpoint is defined, the client can simply iterate through the pages as the user scrolls, adding each page of data to a continuous list. It's especially easy to accomplish with Ionic, because the framework includes an AngularJS directive called ion-infinite-scroll that you can use alongside any list view to easily implement infinite scrolling:

<ion-content>
  <ion-list>
    <ion-item collection-repeat="beer in items" ...>
      ...
    </ion-item>
  </ion-list>

  <ion-infinite-scroll on-infinite="fetchMore()" distance="25%">
  </ion-infinite-scroll>
</ion-content>

In the markup above, the framework will execute the code in the on-infinite attribute whenever the user scrolls to the position described in the distance attribute. In this case, the application will call the fetchMore method on the active scope whenever the user scrolls within 25% of the list's bottom.

In the associated AngularJS controller, the fetchMore method uses the $http service to retrieve the next page of data. It passes the name property of the most recently-fetched list item as the last URL query parameter, telling the backend which page to return.

app.controller("ListController", function($scope, $http) {
  $scope.items = [];
  var end = false;

  $scope.fetchMore = function() {
    if (end) return;

    var count = $scope.items.length;
    var params = count ? {"last": $scope.items[count-1].name} : {}

    $http.get("/beers", {params: params}).success(function(items) {
      if (items.length)
        Array.prototype.push.apply($scope.items, items);
      else end = true;
    }).error(function(err) {
      console.log("Failed to download list items:", err);
      end = true;
    }).finally(function() {
      $scope.$broadcast("scroll.infiniteScrollComplete");
    });
  };
});

Each time that the fetchMore function retrieves data, it appends the new records to the items scope variable. If the backend returns no data, the application assumes that it has reached the end of the list and will stop fetching additional pages. Similarly, it will stop fetching if it encounters an error. In a real-world application, you might want to handle errors more gracefully and make it so that the user can force a retry.

The ion-item element in the HTML markup is bound to the items array, which means that new records will automatically display in the list. When I first built the application, I originally implemented the repeating list item with Angular's ng-repeat directive. I soon discovered that ng-repeat doesn't scale very well to lists with thousands of items---scrolling performance wasn't very good and switching back from the beer detail view was positively glacial.

I eventually switched to Ionic's relatively new collection-repeat directive, which is modeled after the cell reuse techniques that found in native mobile frameworks. Adopting collection-repeat substantially improved scrolling performance and eliminated detail view lag. If you are building mobile web apps with infinite scrolling lists that will house thousands of items, I highly recommend collection-repeat.

Going further

The application has a number of other features that are beyond the scope of this article, but you can get the source code from GitHub and have a look if you would like to learn more.

Install RethinkDB and check out the 10-minute intro guide to start building your first project.