sequence.group(field | function..., [{index: <indexname>, multi: false}]) → grouped_stream
r.group(sequence, field | function..., [{index: <indexname>, multi: false}]) → grouped_stream
Takes a stream and partitions it into multiple groups based on the fields or functions provided.
With the multi
flag single documents can be assigned to multiple groups, similar to the behavior of multi-indexes. When multi
is true
and the grouping value is an array, documents will be placed in each group that corresponds to the elements of the array. If the array is empty the row will be ignored.
Suppose that the table games
has the following data:
[
{id: 2, player: "Bob", points: 15, type: "ranked"},
{id: 5, player: "Alice", points: 7, type: "free"},
{id: 11, player: "Bob", points: 10, type: "free"},
{id: 12, player: "Alice", points: 2, type: "free"}
]
Example: Group games by player.
> r.table('games').group('player').run(conn, callback)
// Result passed to callback
[
{
group: "Alice",
reduction: [
{id: 5, player: "Alice", points: 7, type: "free"},
{id: 12, player: "Alice", points: 2, type: "free"}
]
},
{
group: "Bob",
reduction: [
{id: 2, player: "Bob", points: 15, type: "ranked"},
{id: 11, player: "Bob", points: 10, type: "free"}
]
}
]
Commands chained after group
will be called on each of these grouped
sub-streams, producing grouped data.
Example: What is each player’s best game?
> r.table('games').group('player').max('points').run(conn, callback)
// Result passed to callback
[
{
group: "Alice",
reduction: {id: 5, player: "Alice", points: 7, type: "free"}
},
{
group: "Bob",
reduction: {id: 2, player: "Bob", points: 15, type: "ranked"}
}
]
Commands chained onto grouped data will operate on each grouped datum, producing more grouped data.
Example: What is the maximum number of points scored by each player?
> r.table('games').group('player').max('points')('points').run(conn, callback)
// Result passed to callback
[
{
group: "Alice",
reduction: 7
},
{
group: "Bob",
reduction: 15
}
]
You can also group by more than one field.
Example: What is the maximum number of points scored by each player for each game type?
> r.table('games').group('player', 'type').max('points')('points').run(conn, callback)
// Result passed to callback
[
{
group: ["Alice", "free"],
reduction: 7
}
{
group: ["Bob", "free"],
reduction: 10,
},
{
group: ["Bob", "ranked"],
reduction: 15,
}
]
You can also group by a function.
Example: What is the maximum number of points scored by each player for each game type?
> r.table('games')
.group(function(game) {
return game.pluck('player', 'type')
}).max('points')('points').run(conn, callback)
// Result passed to callback
[
{
group: {"player": "Alice", "type": "free"},
reduction: 7
},
{
group: {"player": "Bob", "type": "free"},
reduction: 10
},
{
group: {"player": "Bob", "type": "ranked"},
reduction: 15
}
]
Using a function, you can also group by date on a ReQL date field.
Example: How many matches have been played this year by month?
> r.table('matches').group(
[r.row('date').year(), r.row('date').month()]
).count().run(conn, callback)
// Result passed to callback
[
{
group: [2014, 2],
reduction: 2
},
{
group: [2014, 3],
reduction: 2
},
{
group: [2014, 4],
reduction: 1
},
{
group: [2014, 5],
reduction: 3
}
]
You can also group on an index (primary key or secondary).
Example: What is the maximum number of points scored by game type?
> r.table('games').group({index:'type'}).max('points')('points').run(conn, callback)
// Result passed to callback
[
{
group: "free",
reduction: 10
},
{
group: "ranked",
reduction: 15
}
]
Suppose that the table games2
has the following data:
[
{ id: 1, matches: {'a': [1, 2, 3], 'b': [4, 5, 6]} },
{ id: 2, matches: {'b': [100], 'c': [7, 8, 9]} },
{ id: 3, matches: {'a': [10, 20], 'c': [70, 80]} }
]
Using the multi
option we can group data by match A, B or C.
r.table('games2').group(r.row('matches').keys(), {multi: true}).run(conn, callback);
// Result passed to callback
[
{
group: "a",
reduction: [ <id 1>, <id 3> ]
},
{
group: "b",
reduction: [ <id 1>, <id 2> ]
},
{
group: "c",
reduction: [ <id 2>, <id 3> ]
}
]
(The full result set is abbreviated in the figure; <id 1>, <id 2>
and <id 3>
would be the entire documents matching those keys.)
Example: Use map and sum to get the total points scored for each match.
r.table('games2').group(r.row('matches').keys(), {multi: true}).ungroup().map(
function (doc) {
return { match: doc('group'), total: doc('reduction').sum(
function (set) {
return set('matches')(doc('group')).sum();
}
)};
}
).run(conn, callback);
// Result passed to callback
[
{ match: "a", total: 36 },
{ match: "b", total: 115 },
{ match: "c", total: 174 }
]
The inner sum
adds the scores by match within each document; the outer sum
adds those results together for a total across all the documents.
If you want to operate on all the groups rather than operating on each group (e.g. if you want to order the groups by their reduction), you can use ungroup to turn a grouped stream or grouped data into an array of objects representing the groups.
Example: Ungrouping grouped data.
> r.table('games').group('player').max('points')('points').ungroup().run(conn, callback)
// Result passed to callback
[
{
group: "Alice",
reduction: 7
},
{
group: "Bob",
reduction: 15
}
]
Ungrouping is useful e.g. for ordering grouped data, or for inserting grouped data into a table.
Example: What is the maximum number of points scored by each player, with the highest scorers first?
> r.table('games')
.group('player').max('points')('points')
.ungroup().orderBy(r.desc('reduction')).run(conn, callback)
// Result passed to callback
[
{
group: "Bob",
reduction: 15
},
{
group: "Alice",
reduction: 7
}
]
When grouped data are returned to the client, they are transformed
into a client-specific native type. (Something similar is done with
times.) In JavaScript, grouped data are
transformed into an Array
. If you instead want to receive the raw
pseudotype from the server, you can specify groupFormat: 'raw'
as an optional
argument to run
:
Example: Get back the raw GROUPED_DATA
pseudotype.
> r.table('games').group('player').avg('points').run(conn, {groupFormat:'raw'}, callback)
// Result passed to callback
{
$reql_type$: "GROUPED_DATA",
data: [
["Alice", 4.5],
["Bob", 12.5]
]
}
Not passing the group_format
flag would return:
[
{
group: "Alice":
reduction: 4.5
},
{
group: "Bob"
reduction: 12.5
}
]
You might also want to use the ungroup command (see above), which will turn the grouped data into an array of objects on the server.
If you run a query that returns a grouped stream, it will be
automatically converted to grouped data before being sent back to you
(there is currently no efficient way to stream groups from RethinkDB).
This grouped data is subject to the array size limit, by default 100,000 elements (see run for details on how to use the arrayLimit
option to change this).
In general, operations on grouped streams will be efficiently
distributed, and operations on grouped data won’t be. You can figure
out what you’re working with by putting typeOf
on the end of your
query. Below are efficient and inefficient examples.
Example: Efficient operation.
// r.table('games').group('player').typeOf().run(conn, callback)
// Returns "GROUPED_STREAM"
r.table('games').group('player').min('points').run(conn, callback) // EFFICIENT
Example: Inefficient operation.
// r.table('games').group('player').orderBy('score').typeOf().run(conn, callback)
// Returns "GROUPED_DATA"
r.table('games').group('player').orderBy('score').nth(0).run(conn, callback) // INEFFICIENT
What does it mean to be inefficient here? When operating on grouped
data rather than a grouped stream, all of the data has to be
available on the node processing the query. This means that the
operation will only use one server’s resources, and will require
memory proportional to the size of the grouped data it’s operating
on. (In the case of the orderBy in the inefficient example, that
means memory proportional to the size of the table.) The array
limit is also enforced for grouped data, so the orderBy
example
would fail for tables with more than 100,000 rows without changing the arrayLimit
option to run
.
Example: What is the maximum number of points scored by each player in free games?
> r.table('games').filter( r.row('type').eq('free'))
.group('player').max('points')('points')
.run(conn, callback)
// Result passed to callback
[
{
group: "Alice":
reduction: 7
},
{
group: "Bob",
reduction: 10
}
]
Example: What is each player’s highest even and odd score?
r.table('games')
.group('name', function(game) {
return game('points').mod(2)
}).max('points')('points').run(conn, callback)
// Result passed to callback
[
{
group: ["Alice", 1]
reduction: 7,
},
{
group: ["Bob", 0],
reduction: 10
},
{
group: ["Bob", 1],
reduction: 15
}
]
Couldn't find what you were looking for?
Contribute: edit this page or open an issue