Use cases
What is RethinkDB good for?
RethinkDB is a persistent key-value store with full support for the Memcached interface. It can be used as a replacement for Memcached in a caching infrastructure, or Memcached-compatible key-value store products such as Membase, MemcacheDB, Tokyo Tyrant, and Schooner Membrain. RethinkDB is designed for exceptionally high performance and reliability on a wide variety of workloads and edge cases. It is especially useful for the following reasons:
- Handles important workloads that competing solutions are unable to handle (e.g. fast startup time, high performance when data goes out of RAM, handling of unusual burst activity, highly unbalanced or variable read/write ratios, large database size, etc.)
- Provides sophisticated durability tuning and guarantees to allow the administrators full control of performance and durability tradeoffs.
- Significantly increases performance of your caching and key-value store infrastructure to allow your existing hardware to scale to much higher load.
- Reduces the number of machines in your infrastructure by allowing existing hardware to operate at higher capacity.
What are example use cases for RethinkDB?
Below are some use cases in which our existing customers deployed RethinkDB.
- Many infrastructures have large databases with only a small subset of the data being used at any given time. RethinkDB supports databases much larger than the size of available RAM and uses caching algorithms to automatically bring active data into RAM. Many of our customers have used RethinkDB to drastically reduce their operational costs by replacing products that require the full database to fit into RAM. As an example, if you have 64GB of data with only 2GB being actively used at any given time (a very common scenario), RethinkDB allows you to switch to significantly cheaper low-memory machines. This applies to cloud services such as EC2 as well as more traditional data center infrastructures.
- Traditional caching solutions either lose the cache in the event of node failure or take a long time to startup. This imposes additional load on the main database, often causing cascading failure effects in many infrastructures. In contrast, RethinkDB retains all data and starts up instantaneously. Our customers have used this property of RethinkDB to significantly reduce failure in their infrastructures.
- For large databases where even the active dataset is too expensive to fit into RAM, RethinkDB can significantly reduce I/O load on the underlying storage subsystem, and significantly increase performance on solid-state as well as rotational drives. Our customers are using RethinkDB to improve the performance of their storage infrastructures.
What is RethinkDB not designed for?
RethinkDB does not currently have native support for structured or relational data, and horizontal scaling.
How does RethinkDB compare to Memcached?
Unlike Memcached, RethinkDB is fully persistent: in the event of a power failure or node restart, the data that was placed into RethinkDB is immediately accessible by the application. RethinkDB can be used as a high-performance persistent replacement for Memcached, or as a full data store for more traditional storage applications.
How does RethinkDB compare to other NoSQL solutions?
Currently the main focus of RethinkDB is on exceptionally high performance and reliability on a wide variety of workloads and hardware configurations. As a result, RethinkDB is usually significantly faster, has more predictable performance, and ships with more sophisticated durability control and administration options than competing solutions. However, RethinkDB typically has a more limited featureset, so if additional features are more important than performance and durability considerations, other NoSQL solutions are likely a better choice.
Performance
Does RethinkDB require solid-state drives?
RethinkDB does not require solid-state drives. It ships with optimizations for a wide range of storage hardware such as solid-state drives, rotational drives, RAID arrays, and network-attached storage. RethinkDB can be used with custom storage systems as well as virtualized storage systems such as Amazon EBS.
What workloads is RethinkDB good for?
Please see the performance report for detailed measurements on a wide variety of workloads.
What hardware is recommended for optimal performance?
On many workloads RethinkDB achieves peak performance with the following hardware components:
- Four or more CPU cores
- 32GB of RAM or higher
- An array of four or more solid-state drives
- Two or more 1Gb ethernet cards
However, most workloads do not require this configuration. RethinkDB can exhibit great performance on many workloads with only a dual-core CPU, a few GB of RAM, and a single rotational drive.
What do I do if RethinkDB doesn't perform well on my workload?
We consider this to be a bug. Please submit a support ticket and it will be addressed by our performance team.
Features
When are you going to add (feature x)?
We prioritize features based on customer demand and our internal roadmap. If there is a feature you'd like to see in RethinkDB, please submit a feature request to support@rethinkdb.com and we will work with you to provide an ETA.
Is RethinkDB durable?
RethinkDB ships with sophisticated durability controls that allow the administrator to tune performance and durability tradeoffs. RethinkDB can be configured to run in a fully durable mode where every transaction is committed to disk before it is acknowledged by the database server, or in a soft-durability mode where the administrator sets acceptable margins (such as the amount of data permitted to be cached in memory before being flushed to disk). Please see the durability section in the manual for more details.
What happens during power failure?
RethinkDB's file format is designed for instantaneous recovery after power failure. If a node fails, RethinkDB can be started as usual without any additional steps. The internal recovery process is very short (typically less than a second) and is performed automatically on startup.
What protocols does RethinkDB support?
Currently RethinkDB only supports the Memcached protocol. More protocols will be added in future versions of RethinkDB.
How do I move my data out of RethinkDB?
RethinkDB ships with data migration tools that convert the database to a series of Memcached commands. Please see the data migration section in the manual for more details.
How does RethinkDB address data corruption?
RethinkDB is designed around a number of processes and tools for minimizing the probability of data corruption and mitigating potential issues should they occur:
- Every version of RethinkDB is run through a comprehensive automated test suite that stresses various parts of the system by simulating a huge variety of workloads. The database file is scanned for corruption after every test, and publicly available versions of RethinkDB ship only after being certified by this rigorous process.
- Before a version of RethinkDB is made publicly available, it undergoes a private beta-testing process where our customers go through internal quality assurance testing. New versions of RethinkDB do not ship until potential issues uncovered in this process are fixed by our development team.
- RethinkDB ships with sophisticated consistency checks and extraction tools. In the event of data corruption, extraction tools can be run to extract recoverable data by the administrator. Please see the data recovery section of the manual.
- For customers who have enterprise support subscriptions, if publicly available recovery tools are not sufficient, our support team can facilitate data recovery by using more sophisticated internal diagnostic and recovery tools.
Architecture
What are the design principles behind RethinkDB?
RethinkDB's design is based on the following requirements that take priority over other considerations:
- Reliability - we do not introduce a feature until we can build automated tests that ensure it operates reliably in a wide variety of situations, operational scenarios, and hardware configurations. All features are subject to extensive reliability testing, as well as integration into data recovery and performance testing processes.
- High performance - RethinkDB is designed to take full advantage of multicore CPUs and modern storage systems. We have extensive automated performance testing for a wide range of workloads. Every time our customers encounter a workload where performance can be significantly improved, we consider it to be a bug and our performance team addresses it as soon as possible.
- Ease of administration - we do not introduce a feature until we can provide a seamless administration experience and build associated tools to simplify daily operations of the software.
- Drop-in deployment - we always prefer to expose features via industry standard protocols, as opposed to introducing new custom protocols.
Community and licensing
How is RethinkDB licensed?
RethinkDB binaries can be downloaded and used for commercial applications for free. A paid subscription that includes support and access to all future updates is also available: see the pricing page for more details. In the future, paid subscriptions will include additional features for advanced customers that will not be available as part of the free edition.
I'm concerned RethinkDB isn't open source. Why should I choose RethinkDB over other open-source solutions?
In many cases open-source alternatives that implement the same protocols are more appropriate than RethinkDB. Typically, RethinkDB is a good choice in the following scenarios:
- Highly demanding infrastructures that require solutions with top-of-the-line performance characteristics
- A requirement for robust behavior in a multitude of edge cases and advanced scenarios where other systems behave poorly
- Infrastructures with high operational costs, where an increase in performance or a reduction in resource usage results in significant operational savings
- Infrastructures with expensive and time-consuming administration processes where sophisticated administration tools can improve efficiency
- Mission-critical environments that require high quality vendor support
Almost everyone at RethinkDB is a heavy open-source supporter, but currently the best way for us to provide superior quality of software and service to our customers is for the source to remain closed. If your environment isn't in any of the above categories, RethinkDB can still be an excellent choice, but the licensing trade-off might make open-source solutions more appropriate for your needs.
