Handling stack overflow on custom stacks

On my computer, the callstack of a new process is around 10MB [1]. Modern operating system automatically reserve some amount of virtual memory and install protections on the page below the stack to create a segmentation fault on stack overflow. This ensures that a stack overflow won’t go corrupting random parts of memory.

We want to have a lot of coroutines, so they should have smaller stacks, maybe 16 or 64KB. This makes stack overflow an even greater possibility, but at the same time, coroutines implemented in user space don’t get this checking for free–we have to build it ourselves. In the process, we can even do better: we can give some information about the coroutine which crashed.Here’s the idea:

  • Memory-protect the page on the bottom of the stack (remember, x86 callstacks grow down) to disallow reads and writes.
  • Install a signal handler for SIGSEGV.
  • In the handler, examine which address caused the page fault:
    • If the address is within the protected page of a coroutine, report a callstack overflow for the coroutine, with a stack backtrace and some information about the coroutine that crashed.
    • Otherwise, report a generic memory fault at the address, with a stack backtrace.

I’m treating stack overflow as a fatal error here, but a more nuanced approach is possible: rather than killing the whole database, it could just kill that coroutine and return to the scheduler. But this particular kind of error tolerance would require broad modifications to RethinkDB which we’re not ready to do. I could also make stack overflow resize the stack to be larger, but this is difficult in C++ because there might be pointers into the stack.

Nothing here is complicated to implement, but it involves the interaction of a few different system calls, which I’ll explain in this article.

Manipulating memory protection

We want to allocate the stack and make the bottom page unreadable and unwritable. The mprotect system call manipulates memory protection, and getpagesize tells us how big a page is (it might not be 4KB). valloc makes a page-aligned memory allocation.

void *stack = valloc(stack_size);
mprotect(stack, getpagesize(), PROT_NONE);

When deallocating the stack, be sure to reset the protection to what it was before.

mprotect(stack, getpagesize(), PROT_READ|PROT_WRITE);
free(stack);

Installing a signal handler

In order to catch the segfault, we have to install a signal handler. The signal system call won’t cut it–it just doesn’t give us enough information about what happened. Instead, we have to use sigaction, which takes a whole struct of parameters, not just a function pointer, for how to handle the signal. One struct member is sa_flags. We have to turn on the SA_ONSTACK flag in order to use a user-provided stack (see below) and the SA_SIGINFO flag, in order to call a function with more information. If SA_SIGINFO is set, then we can set the sa_sigaction member to a function which takes a siginfo_t struct as an argument. The si_addr member of that struct contains the address of the location which caused the fault. All together, the code for establishing the page handler is as follows:

struct sigaction action;
bzero(&action, sizeof(action));
action.sa_flags = SA_SIGINFO|SA_STACK;
action.sa_sigaction = &sigsegv_handler;
sigaction(SIGSEGV, &action, NULL);

The signal handler itself will print out the CPU where the coroutine was initialized, but it would be easy to extend to support printing other metadata contained in the coroutine. int coro_t::in_coro_from_cpu(int) reports which CPU a coroutine was initialized on, or -1 if the address was not from the protected page of a coroutine stack. crash will cause the program to terminate with the given error message, together with a stack trace.

void sigsegv_handler(int signum, siginfo_t *info, void *data) {
    void *addr = info->si_addr;
    int info = coro_t::in_coro_from_cpu(addr);
    if (cpu == -1) {
        crash("Segmentation fault from reading the address %p.", addr);
    } else {
        crash("Callstack overflow from a coroutine initialized on CPU %d at address %p.", cpu, addr);
    }
}

Installing a special stack for the signal handler

By default, when a signal is delivered, its handler is called on the same stack where the program was running. But if the signal is due to stack overflow, then attempting to execute the handler will cause a second segfault. Linux is smart enough not to send this segfault back to the same signal handler, which would prevent an infinite cascade of segfaults. Instead, in effect, the signal handler does not work.

To make it work, we have to provide an alternate stack to execute the signal handler on. The system call to install this stack is called sigaltstack. As a parameter, it takes a stack_t, which consists of a pointer to the base of the stack, the size of the stack, and some flags that aren’t relevant for our purposes.

stack_t segv_stack;
segv_stack.ss_sp = valloc(SEGV_STACK_SIZE);
segv_stack.ss_flags = 0;
segv_stack.ss_size = SEGV_STACK_SIZE;
sigaltstack(&segv_stack, NULL);

SEGV_STACK_SIZE doesn’t have to be so big, but it has to be big enough to call printf from. The MINSIGSTKSZ constant indicates how big a stack has to be to execute any signal handler at all. To be on the safe side, used that constant plus 4096 for SEGV_STACK_SIZE. sigaltstack should be called before the associated call to sigaction which is intended to register a signal handler with that stack.

Operating in a multithreaded environment

If a process calls sigaction and then spawns pthreads within it, then those pthreads will inherit the signal handlers that were already installed. Apparently, this is not the case for sigaltstack: If a signal handler is installed with sigaction using a sigaltstack, and a thread spawned from that process is killed with the right signal, then the installed stack will not be found! The signal handler must instead be installed on each pthread individually. I’m not sure whether this is a bug in Linux or just a quirk of POSIX; in any case, I couldn’t find it documented anywhere.

Pulling it all together

A few simple system calls can allow stack overflows in user-space coroutines to be handled nicely, providing detailed error messages without runtime overhead in cases where the stack does not overflow. Indeed, the benchmarks which I previously reported are unaffected by this change. Other programming language runtimes, like that of Io , checks the height of the callstack on every function call in order to catch overflow. This technique is more efficient.

Io supports callstacks that will resize on overflow, to a point. Such a feature is more difficult to implement in C++ because there might be pointers into the callstack, and weak typing makes it impossible to trace the stacks and heap to update these even if we wanted to. However, virtual memory may be usable to implement this resizing. First, it may just work to allocate very large stacks and not touch them, hoping that the virtual memory system will ensure that no physical memory or swap space is used for the stacks. But this might put stress on the kernel’s data structures, and it may not work well if overcommitting is turned off, as it is in some server environments. Alternatively, mremap may be used to expand the stack from the page fault handler. But I’m not sure how I could reserve a chunk of virtual memory to expand into, without depending on overcommitting in some cases. None of these techniques would work out well on 32-bit systems because there isn’t enough address space. This is still something to look into in the future, though.

I implemented stack overflow checking after getting stuck on a particular bug in the database, and when returning to that bug, I found that this was in fact the cause! Stack overflow isn’t an obscure, uncommon thing, especially when stacks are small and there might be recursion. Working together with the operating system allows us to broaden the applicability of these software engineering benefits to performance-critical code.


[1] You can find the value on your computer by running the following program (without optimizations!) and examining the difference between the first and last number, when it terminates with a segfault.

#include <stdio.h>
int main() {
        char stuff[2048];
        printf("I'm at %p\n", &stuff);
        main();
}