RethinkDB 1.5: secondary indexes, batched inserts performance improvements, soft durability mode

We are pleased to announce RethinkDB 1.5 (), so go download it now!

This release includes the long-awaited support for secondary indexes, a new algorithm for batched inserts that results in an ~18x performance improvement, support for soft durability (don't worry -- off by default), and over 180 bug fixes, features, and enhancements.

Upgrading to 1.5? Make sure to migrate your data before upgrading to RethinkDB 1.5. →

Secondary indexes

Support for secondary indexes has been the most requested feature since we launched RethinkDB, and has been in development for over six months. It required a massive amount of server work and involved modifying almost every part of the codebase: the BTree code, the concurrency subsystem, the distribution layer, the query language, the client drivers, and even the web UI.

We worked hard to make sure secondary indexes are extremely easy to use. Here's how you'd create a secondary index on the last_name attribute:

r.table('users').indexCreate('last_name')

Then getting all users with the last name Smith would be:

r.table('users').getAll('Smith', { index: 'last_name' })

Or you could retrieve arbitrary ranges of the index. For example, all users whose last names are between Smith and Wade:

r.table('users').between('Smith', 'Wade', { index: 'last_name' })

Listing and dropping indexes is also a part of the API:

r.table('users').indexList()             // list indexes on table 'users'
r.table('users').indexDrop('last_name')  // drop index 'last_name' on table 'users'

In addition to manipulating secondary indexes via ReQL, you can perform these from the admin UI:

You can define compound indexes, and even indexes based on arbitrary ReQL expressions. Secondary indexes can be used to do efficient table joins, and are sharded and replicated along with the rest of the database. Learn more about how to use secondary indexes in the FAQ.

Batched inserts performance improvements

Since the initial release, RethinkDB has supported inserting batches of documents. Instead of inserting multiple documents one at a time:

r.table('users').insert({ name: 'Michael' })
r.table('users').insert({ name: 'Bill' })

it was always possible to insert a batch of documents in one command with a single network roundtrip:

r.table('users').insert([{ name: 'Michael' },
                         { name: 'Bill' }])

However, the server had always executed batched inserts by translating them to a series of single insert commands. While sending the data in a single network roundtrip reduced the network latency, it still had very poor performance because the server would have to flush each document to disk before moving on to the next document.

The 1.5 release includes a new insert algorithm that flushes changes to disk in batches while maintaining the guarantee of consistency in case of power failure. This algorithm drastically improves performance of batched inserts. While not something we'd call a benchmark, inserting 100 medium-sized documents went from 2.8 seconds to 160 milliseconds on a development system, and the ~18x performance improvement is consistently reproducible on different batched insert workloads and types of hardware.

Soft durability mode

Early in the development of RethinkDB, we made the design decision to be extremely conservative about durability and safety of users's data. In this respect RethinkDB is like any other traditional database system -- the server does not acknowledge the write until it's safely committed to disk. While this is a sensible default for a database system, many of our users pointed out that they're using RethinkDB for storing access logs or clickstream data, or as a persistent, replicated cache of JSON documents. These scenarios might require trading off some durability guarantees for higher performance.

The 1.5 release includes support for relaxed durability (off by default, of course). For tables in this mode, the server acknowledges writes as soon as it receives them, and flushes data to disk in the background. Flushes normally occur every few seconds, or if there are too many dirty blocks cached in memory.

Hard durability can be turned off when creating a table via the advanced settings in the web UI or in ReQL:

db.tableCreate('http_logs', { hard_durability: false })

You'll notice that write performance on tables with hard durability turned off is about ~30x faster than normal tables. Note that the data still gets flushed to disk in the background and is consistent in case of failures. For those familiar with MySQL's InnoDB engine, RethinkDB's soft durability is similar to setting InnoDB's innodb_flush_log_at_trx_commit to 0 (more info on this setting in InnoDB).

You can change the durability mode for an existing table via the admin CLI:

$ rethinkdb admin -j localhost:29015
localhost:29015> set durability http_logs --hard

Many more enhancements

The 1.5 release includes many more enhancements -- check out the changelog for a complete list. Here are a few more notable changes:

  • For super-fast insert performance, RethinkDB now supports noreply writes that allow writing data over the network without waiting for a server acknowledgment. This mode allows for an apples-to-apples benchmark comparison with MongoDB.
  • The Data Explorer in the web UI now supports electric punctuation, as well as a toggle to turn it off. This can make typing queries in the data explorer much more pleasant.
  • The web UI will now check if there is a new version of RethinkDB and inform you if you need to upgrade (the check is done as a simple AJAX request from the browser, and can be turned off).

Looking forward to 1.6

We are already hard at work on the 1.6 release. This release should be quick and easy, and will include many ReQL enhancements. We will also focus on continuously improving performance with each version. However, the next release isn't set in stone. If a feature is important to you, let us know.

A tool for RethinkDB driver developers

The main focus of the 1.4 release was to refactor the client-server wire protocol to simplify future development of the query language and also to make writing client drivers for new languages significantly easier.

It's been a bit over a month since the release and we've already heard of a couple of new drivers being developed. Erlang, C#, Scala, and the just-announced PHP are the ones we know about, but we hope to hear from you about even more of them!

This seemed like a confirmation that a refactored, simplified and source documented protobuf definition and the rewritten official drivers were indeed what was needed to enable users to start working on new drivers. But we realized, and our users helped us with this, that while being a good start, these do not (and can not) provide all the details needed while developing new drivers.

To make things even simpler, Michel wrote a Python library that exposes all the details of the RethinkDB client server protocol. Basically you can write a query and this library will show:

  • the Protobuf client message
  • the serialized message
  • the serialized response
  • the Protobuf server response

All the details of setting it up and using it are explained in the readme file. Have fun with it and thanks for bringing RethinkDB to other languages!

RethinkDB 1.4: improved wire protocol, updated drivers, data explorer history

RethinkDB 1.4 ("Some Like it Hot") is out. This release includes an improved wire protocol, updated drivers, support for query history in the data explorer, and a brand-new build system. It contains well over 100 bug fixes, features, and improvements (you can see the complete list of closed issues on Github).

Upgrading to 1.4? Make sure to migrate your data before upgrading to RethinkDB 1.4. →

The improved wire protocol and updated drivers

The improved wire protocol has been in the works for over two months, and makes writing client drivers for new languages significantly easier. RethinkDB supports Ruby, Python, and Javascript, and there are community contributed drivers for Haskell, C, and Go. The new protocol will make it even easier to add new languages–so far the most requested include PHP, .NET and Java. If you're interested in helping, contact @mlucy or @wmrowan.

In the process of implementing the new protocol we've also rewritten the official Python, Ruby and Node.js drivers. We used this opportunity to incorporate the feedback we've received from our users.

Read more about the many changes to ReQL for the Ruby, Python, and Javascript drivers for all the details.

Data explorer: query history and suggestion improvements

The easiest way to talk about the new features of the data explorer is a screenshot:

Starting with this version, the data explorer maintains a history of the queries run. Past queries can be re-run with just two clicks, and the history is saved in the browser storage (and is persistent across sessions).

The code completion popup got a couple of improvements as well: API parameters are showed at the top, database and table names are more readable, and the suggester doesn't pop up in unexpected places. Oh, and you no longer need to type .run() at the end of your queries.

Building from source and Homebrew builds

While we continue to work on providing more binary distributions, for some users building from source remains the only solution for installing RethinkDB on their platform. Many people pointed out that the RethinkDB build system had a lot to be desired, so @atnnn rewrote it from scratch. Take a look at the improved build instructions and give the build system a try!

The Mac OS X got a binary distribution with the previous release, but we've also worked on improving the Homebrew formula.

There are many other improvements that we would like to brag about, but we think that downloading the new version would provide more useful and pleasant details about RethinkDB 1.4.