Rethink and Rails together? A NoBrainer!

Have you been looking for a tutorial on using RethinkDB with Ruby on Rails? RethinkDB's Josh Kuhn (@deontologician) has contributed a new integration article for our documentation on using NoBrainer, a RethinkDB ORM that's close to a drop-in replacement for ActiveRecord.

If you already have a little experience with Rails, NoBrainer will feel familiar and natural to you already. You get model generation, scaffolding, validation, and belongs_to and has_many associations. And, you get a lightweight wrapper around ReQL that lets you execute queries like this:

# Find a comment from a user with 'bob' in its name sorted by the name.
# Note: NoBrainer will use the :name index from User by default
User.where(:name => /bob/).order_by(:name => :desc).to_a

Go read the full "Using RethinkDB with Ruby on Rails" guide!

Building realtime apps with RethinkDB and Firebase

We're co-hosting a RethinkDB + Firebase meetup on realtime sync architectures for web and mobile apps. We'll be talking about:

  • designing backend realtime sync architectures
  • scaling those architectures when it's time
  • new features in RethinkDB and Firebase to make building realtime apps easier

Come hang out with the RethinkDB and Firebase teams -- or even better, give a lightning talk on how you're using RethinkDB!

We're meeting at Firebase's office in SF (22 4th Street, Suite 1000, 10th floor) on Tuesday, July 1st at 6pm. Space is limited, so make sure to RSVP on our meetup page.

Want to give a lightning talk? Send an email to Christina (christina@rethinkdb.com) to get a speaking spot.

RethinkDB 1.13: pull data via HTTP, push data via changefeeds

Today, we're happy to announce RethinkDB 1.13 (). Download it now!

The 1.13 release includes over 150 enhancements, including:

  • New http command for seamlessly pulling data from external APIs into RethinkDB
  • New changes command for subscribing to document changes on tables
  • Full promises support in the JavaScript driver
  • A high performance JSON driver protocol
  • Dozens of performance and stability improvements

Upgrading to 1.13? Make sure to migrate your data before upgrading to RethinkDB 1.13. →

Upgrading on Ubuntu? We've moved to our own PPA, so please add the RethinkDB PPA to upgrade.

Pull data via HTTP

Since many APIs accept and return JSON, RethinkDB is a convenient platform for manipulating and analyzing API data. In this release we've added a new http command to make this process even easier (see the API reference and the tutorial). You can now access external APIs directly from the database with a clean and seamless experience!

For example, let's use the GitHub API to get the first ten pages of users who starred the RethinkDB GitHub repository:

r.http('https://api.github.com/repos/rethinkdb/rethinkdb/stargazers',
       page='link-next', pageLimit=10)

The http command returns a JSON stream, just like any other command in ReQL:

# Count the number of values returned by the GitHub API. Pagination is
# off by default, so we're only getting the first page of users.
r.http('https://api.github.com/repos/rethinkdb/rethinkdb/stargazers')
 .count()

# Grab the login and user ID, and then sort by ID
r.http('https://api.github.com/repos/rethinkdb/rethinkdb/stargazers')
 .pluck('login', 'id').orderBy('id')

# Store the results in a table
r.table('stargazers')
 .insert(r.http('https://api.github.com/repos/rethinkdb/rethinkdb/stargazers'))

You can tack on additional ReQL commands just like you would with any other query, store the results in a table, make additional HTTP API calls to pull in more data for each document, control API pagination, and much more! See the API reference and the tutorial for the http command for more details and examples.

Push data via changefeeds

Over the last few months we had many requests to make RethinkDB integration with other systems easier. We've now added a new changes command (see the API reference and the tutorial). Any time a document in the table is inserted, updated, or deleted, the client driver can get notified about the change. Changefeeds offer a convenient way to perform certain tasks:

  • Integrate with other databases or middleware such as ElasticSearch or RabbitMQ.
  • Write applications where clients are notified of changes in realtime.

The changes command returns a stream of changes in a regular cursor, and is very powerful and easy to use:

feed = r.table('users').changes().run(conn)
for change in feed:
    print change

Every time you insert, update, or delete a document in a table, an object describing the change will be added to relevant changefeeds. For example, if you insert a user { 'id': 1, 'name': 'Slava', 'age': 31 } into the users table, RethinkDB will post the following document into the feeds subscribed to users:

{
  'old_val': None,
  'new_val': { 'id': 1, 'name': 'Slava', 'age': 31 }
}

Here old_val is the old version of the document, and new_val is a new version of the document. Because changes returns a regular stream, you can tack on RethinkDB queries to do transformations or filter for specific changes:

# Only get changes where a user's age increases
r.table('users').changes().filter(
    lambda change: change['new_val']['age'] > change['old_val']['age']
).run(conn)

See the API reference and the tutorial for the changes command for more details and examples.

Support for promises in the JavaScript driver

As of this release the RethinkDB JavaScript driver has full support for promises. If you take advantage of promises, new code that interacts with the database can be much cleaner and more convenient.

Here is an example of old JavaScript code to connect to the database:

r.table('posts').run(connection, function(err, cursor) {
  if (err) return console.log(err);
  cursor.toArray(function(err, results) {
    if (err) return console.log(err);
    console.log(results);
  })
}

In the new 1.13 release this code will continue to work, but you can also rewrite it to take advantage of promises:

r.table('posts').run(connection).then(function(cursor) {
  return cursor.toArray();
}).then(function(results) {
  console.log(results);
}).error(console.log);

See the API documentation for connect and next for more details.

JSON driver protocol

Traditionally RethinkDB has used Protocol Buffers to communicate between the drivers and the database server. As of this release, we've added a native JSON driver protocol, and migrated the official drivers to the new implementation.

This change has the following advantages:

  • Almost every language has a well-supported JSON library, but there are still many languages whose protocol buffer implementations have quality and performance issues.
  • RethinkDB drivers can now be written in languages that don't have a good Protocol Buffers port (e.g. Python 3).
  • For deeply nested objects, the new serialization protocol can be more efficient in terms of CPU utilization and network traffic.
  • The driver installation process no longer requires special steps for a fast native backend.

The server still has full support for the Protocol Buffer interface, so community drivers will continue to work without interruption.

If you're a driver developer, check out the new specification for details and hop on the driver developers group with any questions!

Next steps

See the full list of enhancements, and take the new release for a spin!

The team is already hard at work on the upcoming 1.14 release that will likely include support for binary data, geospacial indexing, and cluster administration and monitoring API. As always, if there is something you'd like us to prioritize or have any feedback on the release, please let us know!

Help work on the 1.14 release: RethinkDB is hiring.

@antirez, thank you for Redis!

Yesterday, Salvatore posted an amazing write-up on implementing the really neat HyperLogLog data structure in Redis. We keep being amazed by Redis and Salvatore — he's taught us a great deal about good APIs, usability, and software development in general. We learned so much from Redis, we drew this to celebrate his work. Thanks for your hard work, @antirez!

RethinkDB 1.12: simplified map/reduce, ARM port, new caching infrastructure

Today, we're happy to announce RethinkDB 1.12 (). Download it now!

With over 200 enhancements, the 1.12 release is one of the biggest releases to date. This release includes:

  • Dramatically simplified map/reduce and aggregation commands.
  • Big improvements to caching that do away with long-standing stability and performance limitations.
  • A port to the ARM architecture.
  • Four new ReQL commands for object and string manipulation.
  • Dozens of bug fixes, stability enhancements, and performance improvements.

Upgrading to 1.12? Make sure to migrate your data before upgrading to RethinkDB 1.12. →

Please note a breaking change: the 1.12 release replaces the commands group_by and grouped_map_reduce with a single new command group. You will have to adapt your applications to this change when you upgrade. See the 1.12 migration guide for details.

Simplified map/reduce and aggregation

Let's say you have a table plays where you keep track of gameplay outcomes for users of your game:

[{ play_id: 1, player: 'coffeemug', score: 100 },
 { play_id: 2, player: 'mlucy', score: 1000 },
 { play_id: 3, player: 'mlucy', score: 1200 },
 { play_id: 4, player: 'coffeemug', score: 200 }]

In RethinkDB, you could always count the number of games in the table by running a count command:

> r.table('plays').count().run(conn)
4

The built-in count command is a shortcut for a map/reduce query:

> r.table('plays').map(lambda x: 1).reduce(lambda x, y: x + y).run(conn)
4

The new release removes the old group_by and grouped_map_reduce commands, and replaces them with a single, much more powerful new command called group. This command breaks up a sequence of documents into groups. Any commands chained after group are called on each group individually, rather than all the documents in the sequence.

Let's say we want to count the number of games for each player:

> r.table('plays').group('player').count().run(conn)
{ 'mlucy': 2, 'coffeemug': 2 }

Of course instead of using the shortcut, you could write out the full map/reduce query with the group command:

> r.table('plays').group('player').map(lambda x: 1).reduce(lambda x, y: x + y).run(conn)
{ 'mlucy': 2, 'coffeemug': 2 }

In addition to the already available aggregators like count, sum, and avg, the 1.12 release adds new aggregators min and max. You can now run all five aggregators on any sequence of documents or on groups, resulting in a unified, powerful API for data aggregation.

Chaining after group isn't limited to built-in aggregators. We can chain any command, or series of commands after the group command. For example, let's try to get a random sample of two games from each player:

> r.table('plays').group('player').sample(2).run(conn)

These examples only scratch the surface of what's possible with group. Read more about the group command and the new map/reduce infrastructure.

Big improvements to caching

1.12 includes a lot of improvements to the caching infrastructure. The biggest user-facing change is that you no longer have to manually specify cache sizes for tables to prevent running over memory and into swap. Instead RethinkDB will adjust cache sizes for you on the fly, based on usage statistics for different tables and the amount of memory available on your system.

We've also made a lot of changes under the hood to help with various stability problems users have been reporting. Strenuous workloads and exotic cluster configurations are much less likely to cause stability problems in 1.12.

A port to ARM

Four months ago David Thomas (@davidthomas426 on GitHub) contributed a pull request with the changes necessary to compile and run RethinkDB on ARM. After months of testing and various additional fixes, the ARM port has been merged into RethinkDB mainline.

You shouldn't have to do anything special. Just run ./configure and make as you normally would:

$ ./configure --allow-fetch
$ make

Note that ARM support is experimental, and there are still some issues (such as #239) to work out.

Special thanks to David for the port, and to the many folks who did the testing that made the merge possible!

Object and string manipulation commands

The 1.12 release includes new commands for string manipulation and object creation.

Firstly, ReQL now includes commands for changing the case of strings:

> r.expr('Hello World').downcase().run(conn)
'hello world'

> r.expr('Hello World').upcase().run(conn)
'HELLO WORLD'

We also added a split command for breaking up strings, which behaves similarly to the native Python split:

> r.expr('Hello World').split().run(conn)
['Hello', 'World']

> r.expr('Hello, World').split(',').run(conn)
['Hello', ' World']

Finally, the 1.12 release includes an object command that allows programmatically creating JSON objects from key-value pairs:

> r.object('a', 1, 'b', 2).run(conn)
{ 'a': 1, 'b': 2 }

You can learn more about the commands in the API documentation.

Performance and stability improvements

In addition to stability work by almost everyone on the RethinkDB team, for the past four months @danielmewes dedicated his time almost entirely to stability and performance improvements. He uncovered and fixed dozens of latency and memory problems, stability issues with long running clusters, and slowdowns during highly concurrent workloads.

Here is a very small sample of the stability fixes that ship with the 1.12 release:

See the full list of enhancements, and take the new release for a spin!

Help work on the 1.13 release: RethinkDB is hiring.