RethinkDB 1.12: simplified map/reduce, ARM port, new caching infrastructure
Today, we’re happy to announce RethinkDB 1.12 (The Wizard of Oz). Download it now!
With over 200 enhancements, the 1.12 release is one of the biggest releases to date. This release includes:
- Dramatically simplified map/reduce and aggregation commands.
- Big improvements to caching that do away with long-standing stability and performance limitations.
- A port to the ARM architecture.
- Four new ReQL commands for object and string manipulation.
- Dozens of bug fixes, stability enhancements, and performance improvements.
Upgrading to RethinkDB 1.12? Make sure to migrate your data before upgrading to RethinkDB 1.12.
Please note a breaking change: the 1.12 release replaces the commands
group_by
and grouped_map_reduce
with a single new command group
. You will
have to adapt your applications to this change when you upgrade. See the 1.12
migration guide for details.
Simplified map/reduce and aggregation
Let’s say you have a table plays
where you keep track of gameplay outcomes
for users of your game:
[{ play_id: 1, player: 'coffeemug', score: 100 },
{ play_id: 2, player: 'mlucy', score: 1000 },
{ play_id: 3, player: 'mlucy', score: 1200 },
{ play_id: 4, player: 'coffeemug', score: 200 }]
In RethinkDB, you could always count the number of games in the table by
running a count
command:
> r.table('plays').count().run(conn)
4
The built-in count
command is a shortcut for a map/reduce query:
> r.table('plays').map(lambda x: 1).reduce(lambda x, y: x + y).run(conn)
4
The new release removes the old group_by
and grouped_map_reduce
commands,
and replaces them with a single, much more powerful new command called group
.
This command breaks up a sequence of documents into groups. Any commands
chained after group
are called on each group individually, rather than all
the documents in the sequence.
Let’s say we want to count the number of games for each player:
> r.table('plays').group('player').count().run(conn)
{ 'mlucy': 2, 'coffeemug': 2 }
Of course instead of using the shortcut, you could write out the full
map/reduce query with the group
command:
> r.table('plays').group('player').map(lambda x: 1).reduce(lambda x, y: x + y).run(conn)
{ 'mlucy': 2, 'coffeemug': 2 }
In addition to the already available aggregators like count
, sum
, and
avg
, the 1.12 release adds new aggregators min
and max
. You can now run
all five aggregators on any sequence of documents or on groups, resulting in a
unified, powerful API for data aggregation.
Chaining after group
isn’t limited to built-in aggregators. We can chain any
command, or series of commands after the group
command. For example, let’s
try to get a random sample of two games from each player:
> r.table('plays').group('player').sample(2).run(conn)
These examples only scratch the surface of what’s possible with group
. Read
more about the group command]group-api and the new map/reduce
infrastructure.
Big improvements to caching
1.12 includes a lot of improvements to the caching infrastructure. The biggest user-facing change is that you no longer have to manually specify cache sizes for tables to prevent running over memory and into swap. Instead RethinkDB will adjust cache sizes for you on the fly, based on usage statistics for different tables and the amount of memory available on your system.
We’ve also made a lot of changes under the hood to help with various stability problems users have been reporting. Strenuous workloads and exotic cluster configurations are much less likely to cause stability problems in 1.12.
A port to ARM
Four months ago David Thomas (@davidthomas426 on GitHub) contributed a pull request with the changes necessary to compile and run RethinkDB on ARM. After months of testing and various additional fixes, the ARM port has been merged into RethinkDB mainline.
You shouldn’t have to do anything special. Just run ./configure
and make
as
you normally would:
$ ./configure --allow-fetch
$ make
Note that ARM support is experimental, and there are still some issues (such as #239) to work out.
Special thanks to David for the port, and to the many folks who did the testing that made the merge possible!
Object and string manipulation commands
The 1.12 release includes new commands for string manipulation and object creation.
Firstly, ReQL now includes commands for changing the case of strings:
> r.expr('Hello World').downcase().run(conn)
'hello world'
> r.expr('Hello World').upcase().run(conn)
'HELLO WORLD'
We also added a split
command for breaking up strings, which behaves
similarly to the native Python split
:
> r.expr('Hello World').split().run(conn)
['Hello', 'World']
> r.expr('Hello, World').split(',').run(conn)
['Hello', ' World']
Finally, the 1.12 release includes an object
command that allows
programmatically creating JSON objects from key-value pairs:
> r.object('a', 1, 'b', 2).run(conn)
{ 'a': 1, 'b': 2 }
You can learn more about the commands in the API documentation.
Performance and stability improvements
In addition to stability work by almost everyone on the RethinkDB team, for the past four months @danielmewes dedicated his time almost entirely to stability and performance improvements. He uncovered and fixed dozens of latency and memory problems, stability issues with long running clusters, and slowdowns during highly concurrent workloads.
Here is a very small sample of the stability fixes that ship with the 1.12 release:
- The RethinkDB web server now supports compression, added by @Tryneus to improve the admin UI experience on slow connections.
- @neumino added automated performance regression tests that act as an additional harness.
- @danielmewes made a number of improvements to parallel data processing code.
See the full list of enhancements, and take the new release for a spin!
Help work on the 1.13 release: RethinkDB is hiring.