Hardening C/C++ Programs Part I – Stack Protector

When C, C++ and the Internet were conceived, they were mostly used by academics. Attacks on computer systems were rare, since there was not much incentive to it, so there did not have to be a focus on security and robustness. Nowadays these designs haunt us: The past two decades have uncovered a lot of security vulnerabilities in C/C++ programs, and with many of these programs processing user input from the internet, they can often be exploited easily. A large part of the vulnerabilities stem from memory access errors, which happen easily in C/C++ due to the direct, unchecked access that is possible.

Hardening improves the resiliency of a program against attacks trying to exploit such memory access errors, and other kinds of errors.

The basics of an attack

Exploiting a memory access error works as follows:

Find a piece of code that can write beyond the bounds of a data structure, preferably on the stack.
Provide input to the program which makes the buggy code overwrite critical parts of memory so that program execution is redirected. In most cases the return address on the stack is overwritten to point to code the attacker wants to run.
Possibly inject some code as part of the input (in assembly form) to which execution can be redirected. This is not strictly necessary since program execution can just be redirected to some code already loaded into memory (“return-oriented programming“).

Hardening as a safety net

Now, you might argue that the best way to prevent attacks is to fix the buggy code to make the exploit impossible. But most of the time you will not be aware of the bug: It might just slip through code review, or might only be triggered under very complex circumstances which no reviewer considered. Furthermore there is a large body of legacy code, C code, system code and external libraries that sometimes cannot be audited for correctness. In all these cases hardening provides a safety net to avoid the worst, even when a bug slips into the code.

Below we will discuss several useful hardening techniques. You may be surprised to learn that in some toolchains these are not enabled by default. This may be partly due to the C/C++ mantra of “You pay only for what you need”, partly due to compatibility fears and inertia. But in today’s world, with many programs exposed to the internet in some form, and exploits doing more and more damage due to more data being handled, and more devices running your software, it is essential to harden your binaries!

These techniques are not only meant as a protection against the exploitation of memory errors, but they are most useful for that class of errors. We will discuss Linux ELF only (including x86-64 and PowerPC). Stack protector will be discussed in this post, other techniques will follow: Executable-space protection, ASLR, RELRO/BIND_NOW, Fortify, RPATH/RUNPATH. Please subscribe via Email or RSS on the right to get notified of the upcoming posts!

Stack protector

Perhaps the most dangerous class of memory access errors are stack smashing bugs.

Let’s recap: The stack is a memory region that stores data. When calling a function, the stack stores the address where execution must resume once the functions returns. The stack also stores all variables local to that function. This is very much transparent to the programmer. But by writing to memory beyond those variables it is possible to change the return address, and thus alter the control flow of the program. The write can happen on purpose by using pointer arithmetic, or by coding errors. One typical example for the latter is the handling of user data without proper checks of the limits of data structures:

#include <cstring>

void exploitable(char* bar)
{
   char buffer[16];

   strcpy(buffer, bar); // String  'bar' is directly copied into the buffer.
                        // This will overwrite adjacent data on the stack when
                        // 'bar' is longer than 16 bytes (including null terminator).
}

int main(int argc, char** argv)
{
   exploitable(argv[1]);  // User can pass a string of arbitrary length here.

   return 0;
}

#include <cstring>

void exploitable(char* bar)

{

char buffer[16];

strcpy(buffer, bar); // String 'bar' is directly copied into the buffer.

// This will overwrite adjacent data on the stack when

// 'bar' is longer than 16 bytes (including null terminator).

}

int main(int argc, char** argv)

{

exploitable(argv[1]); // User can pass a string of arbitrary length here.

return 0;

}

On most platforms the stack grows downwards, towards zero¹. When entering a function, the return address will be stored before any local variables on the stack, and therefore has the highest memory address. This makes it vulnerable against an overflow that writes past the end of any of the local variables of the function, like buffer in the example above. An attacker can thus change the return address to another function that might open a shell to execute arbitrary commands with the privileges of the program.

To prevent exploitation of such stack overwrites, the stack protector as implemented in gcc and clang adds an additional guard variable to each function’s stack area. This variable sits on the stack between the return address and the first variable of the function, so it has a higher address than any local variables. It is initialized with a special value on function entry, and checked for that value again on exit. When the value has changed, the program aborts. The assumption is that an attacker trying to change the return address of the program will also overwrite this variable as a side effect, and thus be caught. Aborting the program will then prevent further damage. The special value is chosen randomly at startup and stored globally. A theoretical attack against this defense is guessing the number, and overwriting the guard variable with the same value, but each incorrect try will crash the program, and there are 2^32 or 2^64 possible values, depending on the word size of the architecture.

Adding these checks will lead to a little runtime overhead: More stack space is needed, but that is negligible except for really constrained systems. Storing and checking the value costs a little bit of performance, which can add up and become noticeable. That’s why there are several flavors of the stack protector:

-fstack-protector will instrument functions that call alloca() and functions with char arrays of at least 8 bytes (the gcc documentation says “larger than 8 bytes”, but looking at the disassembly, arrays of size 8 already trigger the instrumentation). --param=ssp-buffer-size=n can be used to control this threshold (default n=8).
-fstack-protector-all will instrument all functions, even empty ones (higher overhead!).
-fstack-protector-strong works like -fstack-protector, but will protect functions with any kind of arrays or references to local frame addresses.
-fstack-protector-explicit will only protect functions explicitly marked with the stack_protect attribute.

The last option is only mentioned for the sake of completeness and should not be used. Hardening is about protecting as many functions as possible, and erring on the side of caution. By only hardening a couple of functions, the risk is high that newly added code will be forgotten. The option is also not implemented by clang. In fact it would be more helpful to have the inverse option to disable hardening for certain functions, for example when the performance impact is large, but it is clear that the function cannot be exploited.

Caveat: -fstack-protector is really a bit limited in the sense that it only applies to char buffers. Arrays of other data types, like int32_t or even wchar_t, are not considered. You need to use one of the other options for that. The gcc documentation should be clearer in this regard.

So which option to use? It largely depends on your project. Do you aim for maximum security at the cost of performance? -fstack-protector-all is for you. This might be a good choice for anything exposed to the internet that handles user data, or codebases that integrate lots of legacy or external code where the risk of memory access errors is high, or just unknown. Otherwise I suggest to try -fstack-protector-strong first, and measure relevant workloads for performance regressions. If there are any, try -fstack-protector with the default settings, and if there are no regressions, try to lower --param=ssp-buffer-size to instrument more functions. In all those cases you could still use -fstack-protector-all for an internal build that runs your test suite to uncover problems before you ship the final build to your customers.

The option you settle on should be passed both during compilation and linking. Here is an end-to-end example for the program above:

$ g++ -fstack-protector example.cpp -o example
$ ./example 0123456789  # Ok, string does not exceed buffer
$ echo $?
0
$ ./example 0123456789012345678901234  # String too long for buffer!
*** stack smashing detected ***: ./example terminated
======= Backtrace: =========
/lib64/libc.so.6(+0x7275f)[0x2b381cbd375f]
/lib64/libc.so.6(__fortify_fail+0x37)[0x2b381cc56467]
/lib64/libc.so.6(__fortify_fail+0x0)[0x2b381cc56430]
./example[0x4006a9]
./example[0x4006cd]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b381cb82b05]
./example[0x400599]
======= Memory map: ========
...
Aborted

$ g++ -fstack-protector example.cpp -o example

$ ./example 0123456789 # Ok, string does not exceed buffer

$ echo $?

$ ./example 0123456789012345678901234 # String too long for buffer!

*** stack smashing detected ***: ./example terminated

======= Backtrace: =========

/lib64/libc.so.6(+0x7275f)[0x2b381cbd375f]

/lib64/libc.so.6(__fortify_fail+0x37)[0x2b381cc56467]

/lib64/libc.so.6(__fortify_fail+0x0)[0x2b381cc56430]

./example[0x4006a9]

./example[0x4006cd]

/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b381cb82b05]

./example[0x400599]

======= Memory map: ========

...

Aborted

It works – your program is now protected!

Considering the importance of this feature you should also cover it with tests:

Test that a simple out-of-bounds write, like in the example above, is detected and terminates your program.
Test that all binaries you ship are correctly instrumented.

The first part is rather easy. A technical detail to consider when crafting your overwrite is that only overwrites of sufficient size will hit the canary, due to alignment and padding done by the compiler. For example, on x86-64, the canary and the data structures are 16-byte aligned by default. The canary is only 8 byte wide, so there is 8 byte of padding leading up to the buffer. You need to overwrite at least 9 bytes to trigger the protector.

The second test can be a bit tricky. One way is examining the build log. The other way is examining the binaries: As you can see in the disassembly, the stack protector adds a reference to a function which is called upon failure: __stack_chk_fail@plt. This function is provided by your C standard library². When your binaries reference that function, they have been built with instrumentation. You need to check all executables and shared objects. This approach will not work when statically linking the standard library. Also you will get false alarms when there are no functions that need to be instrumented in the binary, for example when you are using -fstack-protector and there are no sufficiently large buffers of type char. To workaround this, you can link a dummy file into your binaries that triggers instrumentation and therefore will always reference __stack_chk_fail@plt.

void alwaysInstrumented()
{
  volatile char buffer[1024];  // 'volatile' and
  (void)buffer[0];             // dummy read to prevent removal by compiler
}

void alwaysInstrumented()

{

volatile char buffer[1024]; // 'volatile' and

(void)buffer[0]; // dummy read to prevent removal by compiler

}

Side note: Stack smashing bugs often are also the hardest bugs to find when debugging a crash. First, you may not get a sane stack trace at all at the crash site. Second, the corruptions may lead to visible effects to the program only much later, and at that point there is no obvious correlation to the function that caused the corruption. Enabling the stack protector can help you by finding the offending function much earlier. The extra checks running at the end of the function may pinpoint the problem directly when exiting the buggy function. You should use the strongest level, -fstack-protector-all, in that case. When you are debugging the crash during development, the performance impact should not matter, and even in production the impact is usually tolerable, considering that it is critical to find such a bug quickly.

Notes
¹ Note that stack smashing is still possible when the stack grows upwards, only a few adjustments are needed.

² You can also implement it yourself, but you should really know what you are doing.

References

GCC manual section on stack protector
Good technical overview and some more details to consider
LWN article on -fstack-protector-strong
clang documentation for this feature apparently does not exist, only the command line help shows the options and archived discussions on the mailing lists suggest that the features work in the same way.

4 Replies to “Hardening C/C++ Programs Part I – Stack Protector”

Anon says:

October 23, 2017 at 9:19 am

“Stash smashing” should be corrected to “stack smashing” in note #1.

1. Martin says:
  
  October 23, 2017 at 10:49 am
  
  Thanks, fixed!
  
Kugan Vivekanandarajah says:

December 27, 2017 at 9:53 am

You can write the canary byte at a time (256 possibilities) so it is not too hard to guess if you can try with repeatedly. That is, you guess byte at a time and a 64 bit canary will be found with 8 * 256 attempts.

1. Martin says:
  
  December 27, 2017 at 10:40 am
  
  I think this will not work, since the canary is chosen randomly at runtime. So as soon as one guess is wrong, the process crashes, and you start over with a new canary value. However you can probably reduce the probability of failure a bit.