ReQL is the RethinkDB query language. It offers a very powerful and convenient way to manipulate JSON documents. This document is a gentle introduction to ReQL concepts. You don’t have to read it to be productive with RethinkDB, but it helps to understand some basics.
Want to write useful queries right away? Check out the ten-minute guide.
ReQL is different from other NoSQL query languages. It’s built on three key principles:
.
operator.run
command and pass it an active database connection.Let’s look at these concepts in more detail.
Note: the following examples use the Python driver, but most of them also apply to RethinkDB drivers for other languages.
You start using ReQL in your program similarly to how you’d use other databases:
from rethinkdb import RethinkDB # import the RethinkDB package
r = RethinkDB() # create a RethinkDB object
conn = r.connect() # connect to the server on localhost and default port
But this is where the similarity ends. Instead of constructing strings
and passing them to the database server, you access ReQL by using
methods from the rethinkdb
package:
r.table_create('users').run(conn) # create a table `users`
r.table('users').run(conn) # get an iterable cursor to the `users` table
Every ReQL query, from filters, to updates, to table joins is done by calling appropriate methods.
This design has the following advantages:
In ReQL, you can chain commands at the end of other commands using the
.
operator:
# Get an iterable cursor to the `users` table (we've seen this above)
r.table('users').run(conn)
# Return only the `last_name` field of the documents
r.table('users').pluck('last_name').run(conn)
# Get all the distinct last names (remove duplicates)
r.table('users').pluck('last_name').distinct().run(conn)
# Count the number of distinct last names
r.table('users').pluck('last_name').distinct().count().run(conn)
Almost all ReQL operations are chainable. You can think of the .
operator similarly to how you’d think of a Unix pipe. You select the
data from the table and pipe it into a command that transforms it. You
can continue chaining transformers until your query is done. In ReQL,
data flows from left to right.
Even if you have a cluster of RethinkDB nodes, you can send your queries to any node and the cluster will create and execute distributed programs that get the data from relevant nodes, perform the necessary computations, and present you with final results without you ever worrying about it.
This design has the following advantages:
While queries are built up on the client, they’re only sent to the
server once you call the run
command. All processing happens on the
server—the queries don’t run on the client, and don’t require
intermediary network round trips between the client and the
server. For example, you can store queries in variables, and send them
to the server later:
# Create the query to get distinct last names
distinct_lastnames_query = r.table('users').pluck('last_name').distinct()
# Send it to the server and execute
distinct_lastnames_query.run(conn)
Read about how this technology is implemented for more details.
ReQL queries are executed lazily:
# Get up to five user documents that have the `age` field defined
r.table('users').has_fields('age').limit(5).run(conn)
For this query RethinkDB will perform enough work to get the five documents, and stop when the query is satisfied. Even if you don’t have a limit on the number of queries but use a cursor, RethinkDB will do just enough work to allow you to read the data you request. This allows queries to execute quickly without wasting CPU cycles, network bandwidth, and disk IO.
Like most database systems, ReQL supports primary and secondary indexes to allow efficient data access. You can also create compound indexes and indexes based on arbitrary ReQL expressions to speed up complex queries.
Learn how to use primary and secondary indexes in RethinkDB.
All ReQL queries are automatically parallelized on the RethinkDB server as much as possible. Whenever possible, query execution is split across CPU cores, servers in the cluster, and even multiple datacenters. If you have large, complicated queries that require multiple stages of processing, RethinkDB will automatically break them up into stages, execute each stage in parallel, and combine data to return a complete result.
While RethinkDB doesn’t currently have a fully-featured query optimizer, ReQL is designed with one in mind. For example, the server has enough information to reorder the chain for efficiency, or to use alternative implementation plans to improve performance. This feature will be introduced into future versions of RethinkDB.
So far we’ve seen only simple queries without conditions. ReQL supports a familiar syntax for building more advanced queries:
# Get all users older than 30
r.table('users').filter(lambda user: user['age'] > 30).run(conn)
# If you'd like to avoid writing lambdas, RethinkDB supports an
# alternative syntax:
r.table('users').filter(r.row['age'] > 30).run(conn)
This query looks just like any other Python code you would normally write. Note that RethinkDB will execute this query on the server, and it doesn’t execute native Python code.
The client drivers do a lot of work to inspect the code and convert it to an efficient ReQL query that will be executed on the server:
user['age'] > 30
.lambda
expression is executed only once on the
client. Internally, the driver passes a special object to the
lambda
function which allows constructing a representation of the
query. This representation is then sent to the server over the
network and evaluated on the cluster.Read about how this technology is implemented for more details.
This technology has limitations. While most operations allow you to
write familiar code, you can’t use native language’s operations that
have side effects (such as print
) or control blocks (such as if
and for
). Instead, you have to use alternative ReQL commands:
# WRONG: Get all users older than 30 using the `if` statement
r.table('users').filter(lambda user:
True if user['age'] > 30 else False).run(conn)
# RIGHT: Get all users older than 30 using the `r.branch` command
r.table('users').filter(lambda user:
r.branch(user['age'] > 30, True, False)).run(conn)
This design has the following advantages:
This technology has the following limitation:
lambda
. Learn more about how this
design is implemented for details.You can combine multiple ReQL queries to build more complex ones.
Let’s start with a simple example. RethinkDB supports server-side JavaScript evaluation using the embedded V8 engine (sandboxed within outside processes, of course):
# Evaluate a JavaScript expression on the server and get the result
r.js('1 + 1').run(conn)
Because ReQL is composable you can combine the r.js
command with any
other query. For example, let’s use it as an alternative to get all
users older than 30:
# Get all users older than 30 (we've seen this above)
r.table('users').filter(lambda user: user['age'] > 30).run(conn)
# Get all users older than 30 using server-side JavaScript
r.table('users').filter(r.js('(function (user) { return user.age > 30; })')).run(conn)
RethinkDB will seamlessly evaluate the js
command by calling into
the V8 engine during the evaluation of the filter
query. You can
combine most queries this way into progressively more complex ones.
Let’s say we have another table authors
, and we’d like to get a list
of authors whose last names are also in the users
table we’ve seen
before. We can do it by combining two queries:
# Find all authors whose last names are also in the `users` table
r.table('authors').filter(lambda author:
r.table('users').pluck('last_name').contains(author.pluck('last_name'))).
run(conn)
Here, we use the r.table('users').pluck('last_name')
query as the
inner query in filter
, combining the two queries to build a more
sophisticated one. Even if you have a cluster of servers and both the
authors
table and the users
table are sharded, RethinkDB will do
the right thing and evaluate relevant parts of the query above on the
appropriate shards, combine bits of data as necessary, and return the
complete result.
A few things to note about this query:
run
only
once. Remember to call run
only once on the complex query when
you’re ready for it to be executed.Composing queries isn’t limited to simple commands and inner queries. You can also use expressions to perform complex operations. For example, suppose we’d like to find all users whose salary and bonus don’t exceed $90,000, and increase their salary by 10%:
r.table('users').filter(lambda user: user['salary'] + user['bonus'] < 90000)
.update(lambda user: {'salary': user['salary'] + user['salary'] * 0.1})
In addition to commands described here, ReQL supports a number of sophisticated commands that are composable similarly to the commands described here. See the following documentation for more details:
This design has the following advantages:
Just in case you needed another calculator, ReQL can do that too!
# Add two plus two
(r.expr(2) + r.expr(2)).run(conn)
# You only need to specify `r.expr` once for the driver to work
(r.expr(2) + 2).run(conn)
# More algebra
(r.expr(2) + 2 / 2).run(conn)
# Logic
(r.expr(2) > 3).run(conn)
# Branches
r.branch(r.expr(2) > 3,
1, # if True, return 1
2 # otherwise, return 2
).run(conn)
# Compute the Fibonacci sequence
r.table_create('fib').run(conn)
r.table('fib').insert([{'id': 0, 'value': 0}, {'id': 1, 'value': 1}]).run(conn)
r.expr([2, 3, 4, 5, 6, 7, 8, 9, 10, 11]).for_each(lambda x:
r.table('fib').insert({'id': x,
'value': (r.table('fib').order_by('id').nth(x - 1)['value'] +
r.table('fib').order_by('id').nth(x - 2)['value'])
})).run(conn)
r.table('fib').order_by('id')['value'].run(conn)
Browse the following resources to learn more about ReQL: