Cookbook for Python

Don’t see the recipe you’re looking for? Request or add a recipe by opening an issue on GitHub.

Basic commands

Creating a database

You can use the db_create method as follows:

r.db_create("blog").run()

Another way to create a database is through the web UI. You can reach the web UI at http://HOST:8080. Click on the Tables tab at the top and then use the Add Database button.

Renaming a database

The easiest way to rename a database is to use the config command to access the db_config system table, and then simply use the update command.

r.db("old_db_name").config().update({"name": "new_db_name"}).run()

Creating a table

You can select the database where you’d like to create the table with the db command and use the table_create command as follows:

r.db("blog").table_create("posts").run()

Note that you can omit the db command if you’re creating a table in the default database on your connection (set to test unless specified in connect).

Another way to create a new table is to use the web UI. You can reach the web UI at http://HOST:8080. Click on the Tables tab at the top of the page and then use the Add Table button.

Inserting documents

You can insert documents by calling the insert command on the appropriate table:

r.table("user").insert({
    "name": "Michel",
    "age": 26
}).run()

You can insert multiple documents at once by passing an array of documents to insert as follows:

r.table("user").insert([
    {
        "name": "Michel",
        "age": 26
    },
    {
        "name": "Slava",
        "age": 30
    }
]).run()

Deleting documents

To delete documents, select the documents you’d like to delete and use the delete command. For example, let’s delete all posts with the author “Michel”:

r.table("posts").filter(r.row["author"] == "Michel").delete().run()

Or, let’s try to delete a single user:

r.table("posts").get("7644aaf2-9928-4231-aa68-4e65e31bf219").delete().run()

Here is how we’d delete all documents in a table:

r.table("posts").delete().run()

Retrieving documents

To get all documents in a table, simply use the table command:

r.table("posts").run()

The table command returns a cursor; use the next or for command to iterate through the result set, or list to retrieve the set as an array.

To get a specific document by ID, use get:

r.table("posts").get(1).run()

To retrieve documents by the value of a specific field, use filter:

r.table("posts").filter({"author": "Michel"}).run()

To retrieve documents by the value of a specific index, use get_all:

r.table("posts").get_all("review", index="category").run()

(For more complex filtering recipes, read on.)

Filtering

Filtering based on multiple fields

Suppose you’d like to select all posts where the author’s name is “Michel” and the category is “Geek”. You can do it as follows:

r.table("posts").filter({
    "author": "Michel",
    "category": "Geek",
}).run()

Alternatively, you can use the overloaded & operator to build a predicate and pass it to filter:

r.table("posts").filter(
        (r.row["author"] == "Michel") & (r.row["category"] == "Geek")
    ).run()

Note: RethinkDB overloads & because Python doesn’t allow overloading the proper and operator. Since & has high precedence, make sure to wrap the conditions around it in parentheses.

You can also use the r.and_ command, if you prefer not using overloaded &:

r.table("posts").filter(r.and_(r.row["author"] == "Michel",
                             r.row["category"] == "Geek")).run()

Similarly, you can use the overloaded | operator or the equivalent r.or_ command to filter based on one of many conditions.

Filtering based on the presence of a value in an array

Suppose we have a table users with documents of the following form:

{
    name: "William Adama"
    emails: ["bill@bsg.com", "william@bsg.com"]
    ship: "Galactica"
}

If we want to retrieve all users that have the email address user@email.com, we can write:

r.table("user").filter(r.row["emails"].contains("user@email.com")).run()

If we want to retrieve all users on the Galactica and Pegasus, we can write:

r.table("user").filter(
    lambda user: r.expr(["Galactica", "Pegasus"]).contains(user["ship"])
).run()

Filtering based on nested fields

In Python you can use the operator [] to get the value of a field. This operator can be chained to retrieve values from nested fields.

Suppose we have a table users with documents of the following form:

{
    name: "William Adama"
    contact: {
        phone: "555-5555"
        email: "bill@bsg.com"
    }
}

Let’s filter based on the nested field email:

r.table("user").filter(
    r.row["contact"]["email"] == "user@email.com"
).run()

For many ReQL commands, you can also use a JSON-style nested syntax that allows considerably more flexibility. Read “Accessing nested fields” for more information.

Efficiently retrieving multiple documents by primary key

If you want to retrieve all the posts with the primary keys 1, 2, or 3 you can use the get_all command:

r.table("posts").get_all(1, 2, 3).run()

Efficiently retrieving multiple documents by secondary index

Suppose we have a table posts that links posts to authors via an author_id field. If we’ve created a secondary index on author_id and want to retrieve all the posts where author_id is 1, 2, or 3, we can use the get_all command to do it as follows:

r.table("posts").get_all(1, 2, 3, index='author_id').run()

Retrieving all the objects in a stream (cursor) as an array

If you’re using a command that returns a stream and want to retrieve all of its results at once in an array rather than iterating through them with the cursor object, you can coerce it to an array using list.

posts = list(r.table('posts').run(conn))

See the data type documentation for more detail about streams.

Returning specific fields of a document

If you need to retrieve only a few specific fields from your documents, you can use the pluck command. For example, here is how you’d return only the fields name and age from each row in table users:

r.table("users").pluck("name", "age").run()

This is equivalent to calling SELECT name, age FROM users in SQL.

The pluck command also supports selecting nested fields in a document. For example, suppose we’d like to select the fields phone and email from the following document:

{
    name: "William Adama"
    contact: {
        phone: "555-5555"
        email: "bill@bsg.com"
    }
}

We can use the following syntax:

r.table("users").pluck({"contact": {"phone": True, "email": True}}).run()

Filtering based on a date range

Suppose you want to retrieve all the posts whose date field is between January 1st, 2012 (included) and January 1st, 2013 (excluded), you could do:

r.table("posts").filter( lambda post:
    post.during(r.time(2012, 1, 1, 'Z'), r.time(2013, 1, 1, 'Z'))
).run(conn)

You can also manually compare dates:

r.table("posts").filter( lambda post:
    (post["date"] >= r.time(2012, 1, 1, 'Z')) &
    (post["date"] < r.time(2013, 1, 1, 'Z'))
).run(conn)

Filtering with Regex

If you want to retrieve all users whose last name starts with “Ma”, you can use r.match this way:

# Will return Martin, Martinez, Marshall etc.
r.table("users").filter( lambda user:
    user["lastName"].match("^Ma")
).run(conn)

If you want to retrieve all users whose last name ends with an “s”, you can use r.match this way:

# Will return Williams, Jones, Davis etc.
r.table("users").filter( lambda user:
    user["lastName"].match("s$")
).run(conn)

If you want to retrieve all users whose last name contains “ll”, you can use r.match this way:

# Will return Williams, Miller, Allen etc.
r.table("users").filter( lambda user:
    user["lastName"].match("ll")
).run(conn)

Case insensitive filter

Retrieve all users whose name is “William” (case insensitive).

# Will return william, William, WILLIAM, wiLLiam etc.
r.table("users").filter( lambda user:
    user["lastName"].match("(?i)^william$")
).run(conn)

Performing multiple aggregations simultaneously

If you want to perform a query that returns aggregations on different fields together, this is a canonical use case for map-reduce.

Suppose a data set that lists top movies, ranked by user vote. You’d like to get the total votes and the average age of the top 25 movies: the avg() of the year column and the sum() of the votes column, ordered by the rank column to get the range 1–25.

To perform this, map the first 25 movies into a new result set, adding a count column, then reduce each row of the mapped result set into a total for each field (votes, year and column). Then use do to return a result set with the total votes and the average year, computed by dividing the sum of the years by their count.

r.table('movies').order_by('rank').limit(25).map(lambda doc:
    { 'total_votes': doc['votes'], 'total_year': doc['year'], 'count': 1 }
).reduce(lambda left, right: {
    'total_votes': (left['total_votes'] + right['total_votes']),
    'total_year': (left['total_year'] + right['total_year']),
    'count': (left['count'] + right['count'])
}).do(lambda res: {
    'total_votes': res['total_votes'],
    'average_year': (res['total_year'] / res['count'])
}).run(conn)

We’re working on an easier syntax for performing multiple aggregations after group commands. Follow issue 1725 to track progress on this.

Manipulating documents

Adding/overwriting a field in a document

To add or overwrite a field, you can use the update command. For instance, if you would like to add the field author with the value “Michel” for all of the documents in the table posts, you would use:

r.table("posts").update({ "author": "Michel" }).run()

Removing a field from a document

The update command lets you to overwrite fields, but not delete them. If you want to delete a field, use the replace command. The replace command replaces your entire document with the new document you pass as an argument. For example, if you want to delete the field author of the blog post with the id 1, you would use:

r.table("posts").get("1").replace(r.row.without('author')).run()

Atomically updating a document based on a condition

All modifications made via the update and replace commands are always atomic with respect to a single document. For example, let’s say we’d like to atomically update a view count for a page if the field countable is set to true, and get back the old and new results in a single query. We can perform this operation as follows:

r.table("pages").update(lambda page:
    r.branch(page["countable"] == True,          # if the page is countable
             { "views": page["views"] + 1 },     # increment the view count
             {}                                  # else do nothing
    )), {"return_changes": True}).run()

Performing a conditional insert or replace

Using a similar technique to the last recipe, we can use branch and replace to maintain a document’s updated_at and created_at fields by either inserting a new document or updating one depending on whether a document with a specified ID exists.

def update_with_date(id, user_object):
    r.table('users').get(id).replace(
        lambda doc: r.branch(
            (doc == None),
            r.expr(user_object).merge({'id': id, 'created_at': r.now()}),
            doc.merge(user_object).merge({'updated_at': r.now()}))).run()

Storing timestamps and JSON date strings as Time data types

You can use the epoch_time and iso8601 commands to convert Unix timestamps (in seconds) and JSON date-time strings (which are in ISO 8601 format) to the ReQL time type. The ReQL driver will also convert Python datetime objects into ReQL time.

import time
from datetime import datetime
timezone = time.strftime("%z")
reql_tz = r.make_timezone(timezone[:3] + ":" + timezone[3:])
the_date = datetime.now(reql_tz)
timestamp = time.mktime(the_date.timetuple())
json_date = the_date.isoformat()
r.table("dates").insert({
    'from_object': the_date,
    'from_epoch': r.epoch_time(timestamp),
    'from_iso': r.iso8601(json_date)
}).run(conn)

Use the commands toEpochTime and toISO8601 to convert back.

Incrementing a field value

It’s possible to increment a field value in a document—for example, a counter—in one step on the server.

r.table('aggregated').get(id).update(
    { 'count': (r.row['count'].default(0)+1) }
).run(conn)

Use default to ensure that if the count field doesn’t already exist in the document, it’s added correctly, rather than letting add throw an error.

Pagination

Limiting the number of returned documents

You can limit the number of documents returned by your queries with the limit command. Let’s retrieve just the first 10 blog posts:

r.table("posts").order_by("date").limit(10).run()

Implementing pagination

There are multiple ways to paginate results in RethinkDB. The most straightforward way is using skip and limit (similar to the way SQL’s OFFSET and LIMIT work), but that’s also the least efficient. It’s more efficient to use slice, and even more efficient to use between with a secondary index.

The slice command returns a range from a given start value through but not including a given end value. This makes it easy to use as a skip/limit replacement: the start value is the first item to retrieve, and the end value is the first item plus the limit. To retrieve posts 11-20 from the database using slice:

r.table("posts").order_by("date").slice(11,21).run(conn)

If you have a secondary index, you can use the between command in conjunction with order_by and limit. This is the most efficient way to paginate, but requires looking up values in the secondary index field to find the first record of each page.

Suppose you wanted to paginate through a set of users, 25 at a time. You could get the first 25 records efficiently just with limit.

r.table("users").order_by(index="name").limit(25).run(conn)

For each successive page, start with the last name in the previous page.

r.table("users").between(last_name, r.maxval, left_bound="open",
    index="name").order_by(index="name").limit(25).run(conn)

We pass the last_name saved from the previous set to between as the start index. For the end index, we pass None to return documents from the start index to the table’s end. The left_bound parameter tells between not to include the first record, since it was already returned as part of the previous page.

Transformations

Counting the number of documents in a table

You can count the number of documents with a count command:

r.table("posts").count().run()

Computing the average value of a field

You can compute the average value of a field with the avg command.

r.table("posts").avg("num_comments").run()

Using subqueries to return additional fields

Suppose we’d like to to retrieve all the posts in the table post and also return an additional field, comments, which is an array of all the comments for the relevant post retrieved from the comments table. We could do this using a subquery:

r.table("posts").merge(lambda post:
    {
        "comments": r.table("comments").filter(lambda comment:
            comment["id_post"] == post["id"]).coerce_to("ARRAY")
    }
).run()

Performing a pivot operation

Suppose the table marks stores the marks of every students per course:

[
    {
        "name": "William Adama",
        "mark": 90,
        "id": 1,
        "course": "English"
    },
    {
        "name": "William Adama",
        "mark": 70,
        "id": 2,
        "course": "Mathematics"
    },
    {
        "name": "Laura Roslin",
        "mark": 80,
        "id": 3,
        "course": "English"
    },
    {
        "name": "Laura Roslin",
        "mark": 80,
        "id": 4,
        "course": "Mathematics"
    }
]

You may be interested in retrieving the results in this format:

[
    {
        "name": "Laura Roslin",
        "Mathematics": 80,
        "English": 80
    },
    {
        "name": "William Adama",
        "Mathematics": 70,
        "English": 90
    }
]

In this case, you can do a pivot operation with the group and coerce_to commands:

r.db('test').table('marks').group('name').map(
    lambda row: [row['course'], row['mark']]
).ungroup().map(
    lambda res: r.expr({'name': res['group']}).merge(
        res['reduction'].coerce_to('object'))
).run(conn)

Note: A nicer syntax will eventually be added. See the Github issue 838 to track progress.

Performing an unpivot operation

Doing an unpivot operation to “cancel” a pivot one can be done with the concat_map, map and keys commands:

r.table("pivoted_marks").concat_map(lambda doc:
    doc.without("id", "name").keys().map(lambda course:
        {
            "name": doc["name"],
            "course": course,
            "mark": doc[course]
        }
    )
)

Note: A nicer syntax will eventually be added. See the Github issue 838 to track progress.

Renaming a field when retrieving documents

Suppose we want to rename the field id to id_user when retrieving documents from the table users. We could do:

r.table("users").map(
    # Add the field id_user that is equal to the id one
    r.row.merge(
        lambda doc: { "id_user": doc["id"] }
    ).without("id") # Remove the field id
)

Grouping query results by date/time periods

ReQL has commands for extracting parts of dates and times, including year, month, day, dayOfWeek and more. You can use these with group to group by various intervals. Suppose you had a table of invoices and wanted to retrieve them in groups ordered by year and month:

r.table('invoices').group(
    [r.row['date'].year(), r.row['date'].month()]
).ungroup().merge(
    {'invoices': r.row['reduction'], 'month': r.row['group']}
).without('reduction', 'group').order_by('month').run(conn)

(We also use the technique for renaming a field, described above, to give the names “reduction” and “group” more useful names of “invoices” and “month.”) You could use any combination of the ReQL date/time interval commands in the group, or work with the date/time as a native object.

Currently, ReQL has a default limit of 100,000 elements in an array, and the implementation of group requires the total number of documents grouped to fit within that boundary, so you are limited to 100,000 invoices. This can be changed, however, by passing the array_limit option to run. (Also note that ungroup always returns an array, although this may change in a future version. Follow issue #2719 for progress on this.)

You can also use this approach with a compound index on the intervals you want to group:

r.table('invoices').index_create(
    'by_day', [r.row['date'].year(), r.row['date'].month(), r.row['date'].day()]
).run(conn)

Then you can use that index in the group function. This query would return the highest-value invoice for each day.

r.table('invoices').group(index='by_day').max('price').run(conn)

Miscellaneous

Generating monotonically increasing primary key values

Efficiently generating monotonically increasing IDs in a distributed system is a surprisingly difficult problem. If an inserted document is missing a primary key, RethinkDB currently generates a random UUID. We will be supporting additional autogeneration schemes in the future (see https://github.com/rethinkdb/rethinkdb/issues/117), but in the meantime, you can use one of the available open-source libraries for distributed id generation (e.g. twitter snowflake).

Parsing RethinkDB’s response to a write query

When you issue a write query (insert, delete, update, or replace), RethinkDB returns a summary object that looks like this:

{"deleted":0, "replaced":0, "unchanged":0, "errors":0, "skipped":0, "inserted":1}

The most important field of this object is errors. Generally speaking, if no exceptions are thrown and errors is 0 then the write did what it was supposed to. (RethinkDB throws an exception when it isn’t even able to access the table; it sets the errors field if it can access the table but an error occurs during the write. This convention exists so that batched writes don’t abort halfway through when they encounter an error.)

The following fields are always present in this object:

  • inserted – Number of new documents added to the database.
  • deleted – Number of documents deleted from the database.
  • replaced – Number of documents that were modified.
  • unchanged – Number of documents that would have been modified, except that the new value was the same as the old value.
  • skipped – Number of documents that were unmodified in a write operation, because the document is not available to be deleted or updated. The document might have been deleted by a different operation happening concurrently, or in the case of a get operation the key might not exist.
  • errors – Number of documents that were left unmodified due to an error.

In addition, the following two fields are set as circumstances dictate:

  • generated_keys – If you issue an insert query where some or all of the rows lack primary keys, the server will generate primary keys for you and return an array of those keys in this field. (The order of this array will match the order of the rows in your insert query.)
  • first_error – If errors is positive, the text of the first error message encountered will be in this field. This is a very useful debugging aid. (We don’t return all of the errors because a single typo can result in millions of errors when operating on a large database.)

Using dynamic keys in ReQL commands

Sometimes you may want to write a ReQL document with a dynamic key—the field name is stored in a variable. You can do this with the object command, which takes a list of keys and values ((key, value, key, value ...)) and returns an object from them.

r.table('users').get(1).update(r.object(property_name, value)).run(conn)

The field name can be determined entirely on the server, too. For instance, to update a field whose name is drawn from the value of another field:

r.table('users').for_each(
    lambda doc: r.table('users').get(doc['id']).update(
        r.object(doc['field'], new_value))
).run(conn)

For a practical example, imagine a data set like the one from the pivot example, where each document represents a student’s course record.

[
    {
        "name": "John",
        "mark": 70,
        "id": 1,
        "course": "Mathematics"
    },
    {
        "name": "John",
        "mark": 90,
        "id": 2,
        "course": "English"
    }
]

But you’d like to get a document more like a “report card”:

{
    "Mathematics": 70,
    "English": 90
}

You can accomplish this with object and a pivot.

r.table('marks').filter({'student': 'John'}).map(
    lambda mark: r.object(mark['course'], mark['mark'])
).reduce(
    lambda left, right: left.merge(right)
)

Returning a ReQL query as a string

For testing or logging purposes, you might want to capture a created ReQL query as a string. (You can see an example of this in ReQL error messages.) While there is no ReQL command to do this, you can simply pass the query chain (without run()) as an argument to str():

str(r.table('users').filter(lambda user: user['groups'].contains('operators')))

Building ReQL queries on multiple lines

It’s a common pattern in some query interfaces to “build” queries programmatically by instantiating a query object, calling it several times in succession to add query commands, then calling the execution command. This lets you dynamically change the query based on conditions at runtime. You might expect to do this in ReQL like so:

query = r.table('posts')
if request.filter:
    query.filter(request.filter)
query.order_by('date')
query.run(conn)

However, that won’t work! The reason is that the query object doesn’t store state. Each of the commands after the first one is simply running on the original value of query (in this case, the posts table). You can solve this by explicitly assigning the output of each new command to the query variable:

query = r.table('posts')
if request.filter:
    query = query.filter(request.filter)
query = query.order_by('date')
query = query.run(conn)

Joining multiple changefeeds into one

You might want to produce a “union” changefeed to watch multiple tables or queries on just one feed. Since the union command works with changes, ReQL makes this fairly straightforward. To monitor two tables at once:

r.table('table1').union(r.table('table2')).changes().run(conn)

You might want to “tag” the tables to make it clear which changes belong to which table.

r.table('table1').merge({"table": "table1"}).union(
    r.table('table2').merge({"table": 'table2'}).changes().run(conn)

Also, you can use changes with each query rather than after the whole.

r.table("table1").filter({"flag": "blue"}).changes().union(
    r.table("table2").filter({"flag": "red"}).changes()
).run(conn)