New things in clang land (5.0.0)

LLVM 5.0.0 was already released back in September, but I still would like to mention a couple interesting things I encountered while using clang 5. This will not cover all the new things there are, please check the release notes of the respective LLVM components for that.

More aggressive optimizations

I could not find a mention of this in the release notes, but clang will now eliminate checks for null pointers in more cases. In the example below, the program outputs i is not nullptr, although that is clearly not the case.

 

Here is what happens when clang generates the code for f()int* i is dereferenced in line 10. By definition, a nullptr may not be dereferenced (undefined behavior!), so clang infers that i != nullptr. Consequently, the check in the next line can be removed, and only the code for the “true” branch needs to be generated. Irrespective of the actual argument passed, the same message will be printed.

You may wonder why the dereferencing of i for the function call does not lead to a crash at runtime. The reason is that this does not generate any code that could crash; only accessing int& i in f_unused_parameter() would do that. On the other hand, UndefinedBehaviorSanitizer does complain:

The optimization that removes the check happens at -O1 and above. It did not happen with clang 4 even at -O3.

AddressSanitizer: stack-use-after-scope

Variables located on the stack have a defined lifetime, or scope. When declared in a function body, the scope ends at the end of the function. The same is true in a scope manually defined by braces. Using a variable past the end of the lifetime, for example by handing out a pointer to it, is undefined behavior.

AddressSanitizer now checks for this coding error by default. The feature has been there for quite some time, but now it seems ready for prime time. Memory usage seems to be lower than when I last tested it about half a year ago. The check finds a few interesting things, here are two examples:

Can you spot the error? AddressSanitizer surely can:

Apparently a stack variable is accessed after its scope has already ended. The variable is called ref.tmp. We do not have a variable by that name in the program, so it must have been generated by the compiler. AddressSanitizer can narrow this down further when we are building with -g1 or above. This will generate debug info, and the second part of the report will then look like this:

Ok, so we know we only have to check line 30:

There are only two objects here, the TraceGuard object with a scope until the end of main(), and the temporary Data object which goes out of scope at the end of the statement, but is still referenced by tg! So we have found the problem, and here is a possible fix:

data is now guaranteed to outlive tg.

Here is a less obvious version of the same problem:

This will trigger the same report as above. Even the static_cast produces a temporary here, and AddressSanitizer keenly tracks its lifetime and reports our error.

These coding errors are relatively benign, and may not even lead to problems. But they are undefined behavior, and the compiler is free to re-use the stack space where the temporary resides, which would lead to subtle and hard-to-find bugs. There is a gcc option to control reuse of stack variables:

-fstack-reuse=reuse-level

This option controls stack space reuse for user declared local/auto variables and compiler generated temporaries. reuse_level can be ‘all’, ‘named_vars’, or ‘none’.

When you suspect such problems in a large codebase that you cannot immediately fix, this may be a helpful short-term workaround.

Leave a Reply