In this tutorial we’ll introduce using RethinkDB in Python by playing with a superhero dataset. We’ll show how to insert and retrieve documents, query the database for specific data, and update documents with new information.
Before following this tutorial you must have RethinkDB installed and running. You’ll also need to install the RethinkDB Python library.
Let’s now confirm the setup is correct and we can connect to the RethinkDB
instance. We will assume that RethinkDB is running on the same server and on
the default port (you can change the parameters passed to connect
):
> from rethinkdb import RethinkDB
> r = RethinkDB()
> # connect and make the connection available to subsequent commands
> r.connect('localhost', 28015).repl()
> print r.db_list().run()
[u'test']
The result lists the default database available in RethinkDB.
Tips: The r.connect(...).repl()
function is useful when using
the RethinkDB Python driver from interactive shells making available
the connection for all subsequent run
calls.
A single RethinkDB instance can host multiple database, but most of the time you’ll work with a single database at a time. RethinkDB comes with a default database so you can quickly experiment with it:
> test = r.db('test')
For your application you’ll probably want to use a new database and define new tables:
// creating a new database
> r.db_create('python_tutorial').run()
// creating a new table
> r.db('python_tutorial').table_create('heroes').run()
You can see the new database and table through the browser-based administrative UI: http://localhost:8080/#tables.
Because we will continue to use the heroes
table, let’s save it as a reference for the next operations:
> heroes = r.db('python_tutorial').table('heroes')
RethinkDB stores data in JSON, so passing dict
s from Python requires no additional conversions:
> heroes.insert({
"hero": "Wolverine",
"name": "James 'Logan' Howlett",
"magazine_titles": ["Amazing Spider-Man vs. Wolverine", "Avengers",
"X-MEN Unlimited", "Magneto War", "Prime"],
"appearances_count": 98
}).run()
{u'errors': 0,
u'generated_keys': [u'c6677d9f-1740-4499-bf17-92f10cab30cf'],
u'inserted': 1}
Tips: As you can notice in the result, RethinkDB generates a unique ID for the documents that do not provide one.
You can also insert multiple documents at a time by passing insert
an array of dicts:
> heroes.insert([
{
"hero": "Magneto",
"name": "Max Eisenhardt",
"aka": ["Magnus", "Erik Lehnsherr", "Lehnsherr"],
"magazine_titles": ["Alpha Flight", "Avengers", "Avengers West Coast"],
"appearances_count": 42
},
{
"hero": "Professor Xavier",
"name": "Charles Francis Xavier",
"magazine_titles": ["Alpha Flight", "Avengers", "Bishop", "Defenders"],
"appearances_count": 72
},
{
"hero": "Storm",
"name": "Ororo Monroe",
"magazine_titles": ["Amazing Spider-Man vs. Wolverine", "Excalibur",
"Fantastic Four", "Iron Fist"],
"appearances_count": 72
}
]).run()
{u'errors': 0,
u'generated_keys': [u'd7d5e949-3f71-4e21-b5b7-42b6e7048ea3',
u'747c057e-8810-4479-a6b2-3c28d8057b48',
u'372fa6fe-17ec-494b-a926-0d99ba8ced43'],
u'inserted': 3}
Even if we only inserted 4 documents, you can double check that by running
heroes.count().run()
. Let’s take a quick look at them:
> heroes.run()
<rethinkdb.net.Cursor object at 0x15d4710>
Tips: If the table contains a large number of documents a query will not return all of them at once, which would saturate the network and/or require a lot of memory on the client. Instead the query will return the results in batches and fetch more data as needed.
Let’s now retrieve a document by its ID:
> heroes.get('d7d5e949-3f71-4e21-b5b7-42b6e7048ea3').run()
{u'aka': [u'Magnus', u'Erik Lehnsherr', u'Lehnsherr'],
u'appearances_count': 42,
u'hero': u'Magneto',
u'name': u'Max Eisenhardt',
u'id': u'e4bbd5e0-de9c-15ac-672c-d00b9f23a1f5',
u'magazine_titles': [...]}
RethinkDB supports a wide range of filters, so let’s try a couple of different ones. Firstly let’s retrieve Professor Xavier by his character name:
> heroes.filter({'name': 'Charles Francis Xavier'}).run()
<rethinkdb.net.Cursor object at 0x15dc910>
Next thing we can do is to order the characters based on the number of magazines they’ve appeared in:
> heroes.order_by(r.desc('appearances_count')).pluck('hero',
'appearances_count').run()
<rethinkdb.net.Cursor object at 0x15dc450>
Tips: Ordering result ascending or descending can be done using asc
and
desc
respectively.
The server-side operation pluck
allows fetching only the specified
attributes of the result documents.
As you see only 1 of the characters has appeared in more than 90 magazines. This is something we could also verify with the query:
> heroes.filter(r.row['appearances_count'] >= 90).pluck('hero',
'name', 'appearances_count').run()
For the last query example let’s retrieve the characters that appeared in a
specific magazine. This demonstrates using filter
on a nested list in a
document:
> heroes.filter(
r.row['magazine_titles'].filter(
lambda mag: mag == 'Amazing Spider-Man vs. Wolverine'
).count() > 0
).pluck('hero').run()
We’ll finish this tutorial by appending a new magazine to each of the characters and also updating their number of appearances:
heroes.update({
'appearances_count': r.row['appearances_count'] + 1,
'magazine_titles': r.row['magazine_titles'].append(
'The Fantastic RethinkDB')
}).run()
{u'errors': 0, u'skipped': 0, u'updated': 4}
Tips: RethinkDB supports atomic updates at the document level. You can read more about about the RethinkDB atomicity model.