Wondering how to model your data? Read about data modeling in RethinkDB.
Like many traditional database systems, RethinkDB supports JOIN
commands to combine data from multiple tables. In RethinkDB joins are
automatically distributed—a join command is automatically sent
to the appropriate nodes across the cluster, the relevant data is
combined, and the final result is presented to the user.
Let’s see how we can use joins in RethinkDB to query data based on one to many, and many to many relations.
Let’s suppose we’ve created two tables: employees
and
companies
. We’ll use these tables to model the notion of people
working for organizations (each organization has multiple people
working for it, but any given person works at a single
organization). Here’s an example document in the employees
table:
{
"id": "543ad9c8-1744-4001-bb5e-450b2565d02c",
"name": "Jean-Luc Picard",
"company_id": "064058b6-cea9-4117-b92d-c911027a725a",
"rank": "captain"
}
And here’s an example document in the companies
table:
{
"id": "064058b6-cea9-4117-b92d-c911027a725a",
"company": "Starfleet",
"type": "paramilitary"
}
We can join the two tables as follows:
r.table("employees").eq_join("company_id", r.table("companies")).run()
This query joins the company_id
of the employee table with the
primary key of the company table. It returns a sequence of documents
where each document contains two fields—the employee
information and the company information:
{
"left": {
"id": "543ad9c8-1744-4001-bb5e-450b2565d02c",
"name": "Jean-Luc Picard",
"company_id": "064058b6-cea9-4117-b92d-c911027a725a",
"rank": "captain"
},
"right": {
"id": "064058b6-cea9-4117-b92d-c911027a725a",
"company": "Starfleet",
"type": "paramilitary"
}
}
left
contains the information from the left table in the
query (in this case, the employee)right
contains the information from the right table in
the query (in this case, the company)We can chain the zip
command at the end of the query to merge the
two fields into a single document. For example, the following query:
r.table("employees").eq_join("company_id", r.table("companies")).zip().run()
Returns the following result:
{
"id": "064058b6-cea9-4117-b92d-c911027a725a",
"name": "Jean-Luc Picard",
"company_id": "064058b6-cea9-4117-b92d-c911027a725a",
"rank": "captain",
"company": "Starfleet",
"type": "paramilitary"
}
A common data access task is retrieving one document with associated “child” documents. (This would often be in a one-to-many relationship as shown here, but could be a many-to-many or one-to-one relationship.) In our example data set, we might want to retrieve information about a company and all its employees. We can do this in one ReQL command using merge
and a subquery in its lambda function.
id = "064058b6-cea9-4117-b92d-c911027a725a"
r.table("companies").get(id).merge(lambda company:
{ 'employees': r.table('employees').get_all(company['id'],
index='company_id').coerce_to('array') }
).run()
This will return a result similar to:
{
"id": "064058b6-cea9-4117-b92d-c911027a725a",
"company": "Starfleet",
"type": "paramilitary",
"employees": [
{
"id": "543ad9c8-1744-4001-bb5e-450b2565d02c",
"name": "Jean-Luc Picard",
"company_id": "064058b6-cea9-4117-b92d-c911027a725a",
"rank": "captain"
},
...
]
}
Where eq_join
produces a table-like result (the rough equivalent of SQL’s SELECT * FROM companies, employees WHERE companies.id = employees.company_id
), using a subquery produces a nested document, where the employee objects are returned in a list in the employees
field.
Suppose that our data model for the employees stores a company name instead of a company id:
{
"id": "543ad9c8-1744-4001-bb5e-450b2565d02c",
"name": "Jean-Luc Picard",
"company_name": "Starfleet",
"rank": "captain"
}
We can create a secondary index on the company
field of the
companies
table, and perform our query by taking advantage of the
secondary index:
r.table("companies").index_create("company").run()
The query would look like this:
r.table("employees").eq_join("company_name",
r.table("companies"), index="company").run()
Want to learn more about indexes?: Read about using secondary indexes in RethinkDB.
Note: you can also join tables on arbitrary fields without creating an index using the inner_join command. However, arbitrary inner joins are less efficient then equijoins.
You can also use RethinkDB to query many to many relations. Let’s suppose we have a collaborative blogging platform where authors collaborate to create posts (multiple authors can work on any given post, and publish multiple posts).
In order to model this data we’d create three tables—authors
, posts
and authors_posts
, similarly to how we’d do it in a
relational system. Here is example data for the authors
table:
{
"id": "7644aaf2-9928-4231-aa68-4e65e31bf219",
"name": "William Adama",
"tv_show": "Battlestar Galactica"
}
{
"id": "064058b6-cea9-4117-b92d-c911027a725a",
"name": "Laura Roslin",
"tv_show": "Battlestar Galactica"
}
Here is example data for the posts
table:
{
"id": "543ad9c8-1744-4001-bb5e-450b2565d02c",
"title": "Decommissioning speech",
"content": "The Cylon War is long over..."
}
And here is example data for the authors_posts
table:
{
"author_id": "7644aaf2-9928-4231-aa68-4e65e31bf219",
"post_id": "543ad9c8-1744-4001-bb5e-450b2565d02c"
}
{
"author_id": "064058b6-cea9-4117-b92d-c911027a725a",
"post_id": "543ad9c8-1744-4001-bb5e-450b2565d02c"
}
In a many to many relation, we can use multiple eq_join
commands to join
the data from all three tables:
r.table("authors_posts").eq_join("author_id", r.table("authors")).zip().
eq_join("post_id", r.table("posts")).zip().run()
The result of this query is a stream of documents that includes every post written by every author in our database:
{
"tv_show": "Battlestar Galactica",
"title": "Decommissioning speech",
"post_id": "543ad9c8-1744-4001-bb5e-450b2565d02c",
"name": "William Adama",
"id": "543ad9c8-1744-4001-bb5e-450b2565d02c",
"content": "The Cylon War is long over...",
"author_id": "7644aaf2-9928-4231-aa68-4e65e31bf219"
}
{
"tv_show": "Battlestar Galactica",
"title": "Decommissioning speech",
"post_id": "543ad9c8-1744-4001-bb5e-450b2565d02c",
"name": "Laura Roslin",
"id": "543ad9c8-1744-4001-bb5e-450b2565d02c",
"content": "The Cylon War is long over...",
"author_id": "064058b6-cea9-4117-b92d-c911027a725a"
}
If you use the zip
command after join
, the document from the right
table will be merged into the left one.
Consider the following query:
r.table("employees").eq_join("company_id", r.table("companies"))
Suppose its output is as follows:
{
# Employee
"left": {
"id": "543ad9c8-1744-4001-bb5e-450b2565d02c",
"name": "Jean-Luc Picard",
"company_id": "064058b6-cea9-4117-b92d-c911027a725a",
"rank": "captain"
},
# Company
"right": {
"id": "064058b6-cea9-4117-b92d-c911027a725a",
"company": "Starfleet",
"type": "paramilitary"
}
}
The conflicting field is id
. If you directly use the zip
command,
the id
field of the result will be the one from the company. There
are three ways to resolve potential field conflicts.
Suppose that you want to keep the id
field of the employee, but not
the one of the company. You can do it by removing the field
right.id
, then calling the zip
command.
r.table("employees").eq_join("company_id", r.table("companies"))
.without({"right": {"id": True}}) # Remove the field right.id
.zip()
.run()
If you need to keep both fields, you can rename them with map
and
without
before using the zip
command.
r.table("employees").eq_join("company_id", r.table("companies"))
# Copy the field right.id into right.c_id
.map( r.row.merge({
"right": {
"c_id": r.row["right"]["id"]
}
}))
# Remove the field right.id
.without({"right": {"id": True}})
.zip()
.run()
You can manually merge the left
and right
fields without using the
zip
command. Suppose you want to keep the name of the employee and
the name of his company. You can do:
r.table("employees").eq_join("company_id", r.table("companies"))
.map({
"name": r.row["left"]["name"],
"company": r.row["right"]["company"]
}).run()
To learn more, read about data modeling in RethinkDB. For detailed information, take a look at the API documentation for the join commands: