Improving a large C++ project with coroutines

RethinkDB Team December 20, 2010

At the core of RethinkDB is a highly parallel B-tree implementation. Due to our performance requirements, it is too expensive to create a native thread for each request. Instead, we create one thread per CPU on the server (logical CPU in the case of hyperthreading) and use cooperative concurrency within a thread.

A single thread will have multiple logically concurrent units of control, taking turns when a unit needs to block. Blocking needs to take place ultimately for either I/O–waiting for information from the network or disk, or waiting to be notified that sending information there has completed–or for coordination with other threads. On top of this, we implemented higher-level abstractions which also block.

Read the full post

Distributed software testing

Daniel Mewes December 09, 2010

About me

A word about me first: My name is Daniel Mewes, and I just came over to California to work at RethinkDB as an intern for the oncoming months. After having been an undergraduate student of computer science at Saarland University, Germany for the last two years, I am exited to work on an influential real-world project at RethinkDB now. Why RethinkDB? Not only does RethinkDB develop an exciting and novel piece of database technology, RethinkDB also provides the great “startup kind” of work experience.

Software testing

In complex software systems like database management systems, different components have to work together. These components can interact in complex ways, yielding a virtually infinite number of possible states that the overall system can reach. This has consequences for software testing. As bugs in the code might only show up in a small fraction of the possible states, comprehensive testing of the system is essential. Encapsulation of code and data into objects can reduce the number of states that must be considered for any single piece of code. However an extremely large number of states can still remain, especially when considering parallel systems. Reliability requirements for database management systems on the other hand are stringent. Losing or corrupting data due to bugs in the program cannot be tolerated here.

Read the full post

Multi slicing

RethinkDB Team August 25, 2010

The basic data structure that powers databases is called a B-tree. This is where you actually store the user’s data. B-trees are great because you can put huge amounts of data in them and access remains fast. In fact, the naive strategy of putting a whole data set into one B-Tree doesn’t break down because of access time. It does, however, break down when you try to support a multiaccess paradigm.

Six years ago, multiaccess was nice. Now that processors have multiple cores, it’s crucial. Four cores fighting over one B-tree means a lot of wasted processor time. In a multiaccess scheme, different cores can concurrently access data. This gets tricky. You can go looking for a piece of data only to find that someone has moved it since you started; that’s trouble: for all you know it was deleted. You could start the search over, but without guarantees– maybe you’ll get unlucky and it will be plucked out from under you again. How do you know when to give up? Your database is now blazingly fast, but also broken. We handle this with a locking scheme.

Read the full post

Make debugging easier with custom pretty-printers

RethinkDB Team August 19, 2010

What’s good about pretty-printers

One of the best features in Gdb 7.0+ is the ability to write pretty-printers in Python. Instead of printing a vector and seeing this:

$1 = {
  <std::_Vector_base<int,std::allocator<int> >> = {
    _M_impl = {
      <std::allocator<int>> = {
        <__gnu_cxx::new_allocator<int>> = {<No data fields>}, <No data fields>}, 
      members of std::_Vector_base<int,std::allocator<int> >::_Vector_impl: 
      _M_start = 0x0, 
      _M_finish = 0x0, 
      _M_end_of_storage = 0x0
    }
  }, <No data fields>}

I can now see this:

$1 = std::vector of length 0, capacity 0

Read the full post

The benchmark you're reading is probably wrong

RethinkDB Team July 08, 2010

Mikeal Rogers wrote a blog post on MongoDB performance and durability. In one of the sections, he writes about the request/response model, and makes the following statement:

MongoDB, by default, doesn’t actually have a response for writes.

In response, one of 10gen employees (the company behind MongoDB) made the following comment on Hacker News:

We did this to make MongoDB look good in stupid benchmarks.

Read the full post