RethinkDB 1.6: regex matching, new array operations, random sampling

We are happy to announce RethinkDB 1.6 (Fargo). Go download it now)!

This release includes regex string matching, fourteen new array operations, support for random sampling, error handling improvements, support for authentication keys, and over 65 bug fixes, features, and enhancements.

Upgrading to RethinkDB 1.6? Make sure to migrate your data before upgrading to RethinkDB 1.6.

Regex string matching

RethinkDB 1.6 introduces regular expressions into the query language (ReQL). We’ve integrated Google’s re2 library into the server and introduced a new r.match command.

The match command returns the matched string, the starting and ending location, and capture groups in a json object. It returns null if there is no match.

You can use it to match regular expressions in strings as follows:

r.expr('My name is Slava').match('name is (.*)')
// returns {"str":"name is Slava","start":3,"groups":[{"str":"Slava","start":11,"end":16}],"end":16} 

Here’s an example of using regular expressions to find every user whose first name begins with the letter S:

r.table('users').filter(r.row('first_name').match('S.*'))
// returns all JSON documents where `first_name` begins with `S`:

You can even use match in a secondary index for efficient access:

// Create a secondary index for all users whose first names begin with `S`
r.table('users').indexCreate('starts_with_S', r.row('first_name').match('S.*').ne(null))

// Efficiently get all the users where `first_name` begins with `S`
r.table('users').getAll(true, {index: 'starts_with_S'})

New array operations

Over the past few months we’ve discovered that many of our users are using arrays in ways we didn’t anticipate – as sets, or as containers for deeply nested elements. In the 1.6 release we’ve added fourteen new array operations to make these use cases easier:

  • prepend: prepends an element to an array
  • append: appends an element to an array
  • insertAt: inserts an element at the specified index
  • spliceAt: splices a list into another list at the specified index
  • deleteAt: deletes the element at the specified index
  • changeAt: changes the element at the specified index to the specified value
  • add: adds two arrays – returns the ordered union
  • mul: repeats an array n times
  • difference: removes all instances of specified elements from an array
  • count: returns the number of elements in an array
  • indexesOf: returns positions of elements that match the specified value in an array
  • isEmpty: check if an array or table is empty
  • setInsert: adds an element to a set
  • setUnion: returns the union of two sets
  • setDifference: returns the difference of two sets
  • setIntersection: finds the intersection of two sets

These new commands should make working with arrays in RethinkDB delightful.

Random sampling

Many people use RethinkDB for analytics and data crunching, and have been asking us to add support for random sampling. We’ve now added a new r.sample command that returns a random sample of a table:

// Get a random sample of five users
r.table('users').sample(5)

You can also use the sample command on arrays:

// Get three random elements from the array
r.expr([1, 2, 3, 4, 5]).sample(3)

Error handling improvements

When we initially designed the error handling system in RethinkDB, we made a design decision: errors should be as strict as possible. Prior to the 1.6 release, filtering documents by non-existent attributes would throw an error.

The following command:

r.expr([{name: 'Slava', age: 30}, {name: 'Mickey'}]).filter(r.row('age').gt(20))

would return this error:

RqlRuntimeError: No attribute `age` in object:
{ "Mickey" } in:
r([{name: "Slava", age: 30}, {name: "Mickey"}]).filter(function(var_5) { return r.row("age").gt(30); })
                                                                                ^^^^^^^^^^^^

In order to avoid this issue, users had to explicitly check for existence of the attributes as part of the query. We found that everyone unilaterally disliked this behavior.

In RethinkDB 1.6, rows that have missing attributes will automatically be skipped by operations like pluck and filter:

r.expr([{name: 'Slava', age: 30}, {name: 'Mickey'}]).filter(r.row('age').gt(30))

// now returns [{name: 'Slava', age: 30}]

If you need strict error handling in filter, you can enable the old behavior:

r.expr([{name: 'Slava', age: 30}, {name: 'Mickey'}]).filter(r.row('age').gt(30),
                                                            {default: r.error()})
// throws the `No attribute age` error.

Basic authentication support

This version also introduces basic authentication via shared keys. Many of our users run RethinkDB in public clouds and environments like EC2; while it’s easy enough to secure RethinkDB with a reverse proxy and Amazon security groups, we had a lot of demand for basic shared-key authentication.

Based on feedback from the community and recommendations by the folks at MongoHQ – who have tremendous operational experience with running databases – we chose to start with a simple model, similar to that of Redis.

You can now set a simple authentication key, and the cluster will reject any client driver that doesn’t provide the key. Set the key as follows:

$ rethinkdb admin -j host:port
> set auth foobar

Once the authentication key is set, you have to provide it to the client drivers in order to connect to the server:

r.connect({host: host, post: port, authKey: 'foobar'}, ...)

You can unset the key as follows:

$ rethinkdb admin -j host:port
> unset auth

Looking forward to 1.7

Our team is now working on the 1.7 release. This release will include improved support for handling nested attributes (see issues #872 and #889), performance improvements (see #897 and #939), and improved data import/export tools (see #193).

We’d love to hear what you think about the roadmap – let us know!