Introduction
I recently discovered a test-case that crashed cdecl
:
c++decl> explain int operator new(size_t)
Assertion failed: (ast != NULL), function c_ast_unpointer_qual, file c_ast_util.c, line 404.
zsh: abort (core dumped) c++decl
FYI, it’s a negative test: operator
new
must returnvoid*
, notint
.Briefly,
cdecl
is a program that helps you compose and decipher C (or C++) declarations or casts. Its parser parses C (or C++) declarations and constructs an abstract syntax tree (AST) that is then traversed to check for semantic errors.
The crash is strange for a couple of reasons that I knew initially. The test:
- Crashes only when
cdecl
is compiled using clang 14.0.5. - Does not crash when compiled with various versions of
gcc
on any platform.
(I’d come to discover that it’s even stranger later.)
The Code
Here’s the code of the function containing line 404:
static c_ast_t const* c_ast_unpointer_qual( c_ast_t const *ast,
c_tid_t *qual_stids ) {
ast = c_ast_untypedef( ast );
if ( ast->kind != K_POINTER )
return NULL;
ast = ast->as.ptr_ref.to_ast;
assert( ast != NULL ); // <-- Line 404: THIS FAILS
assert( qual_stids != NULL );
*qual_stids = ast->type.stids & TS_ANY_QUALIFIER;
// ...
return c_ast_untypedef( ast );
}
This function takes pointer to an AST node and, if it represents a pointer in a C declaration, “un-pointers” it to obtain the pointed-to AST node. It also gets the qualifiers, e.g.,
const
, of the pointed-to AST node.
Specifically, the first assert()
fails which means ast
is NULL
which means as.ptr_ref.to_ast
is NULL
. That means to_ast
wasn’t set when the AST was being constructed in the parser — or so I thought.
Using a Debugger
The first thing I did was try to use a debugger and set a breakpoint a few lines up. Unfortunately, the line I wanted to set the breakpoint on was eliminated by the compiler. So I set a breakpoint nearby. Unfortunately again, the value of ast
had been “optimized out” by the compiler.
So I recompiled with -O0
— but then the program didn’t crash. A Heisenbug! (This isn’t a good sign.) So I set a breakpoint in the function that calls c_ast_unpointer_qual()
(which I learned from the backtrace from the core dump):
c_ast_t const* c_ast_is_ptr_to_tid_any( c_ast_t const *ast,
c_tid_t tids ) {
c_tid_t qual_stids;
ast = c_ast_unpointer_qual( ast, &qual_stids );
return c_ast_is_tid_any_qual_impl( ast, tids, qual_stids );
}
This function takes a pointer to an AST node and returns the pointed-to AST node, but only if the first represents a pointer and it’s a pointer to any one of a set of types, e.g.,
void*
.
This time, the debugger stopped at the breakpoint and I discovered that the AST node represents the built-in type (K_BUILTIN
) of int
(the declared return type of operator new
) and in particular it’s not K_POINTER
. That means this if
:
if ( ast->kind != K_POINTER )
return NULL;
should have evaluated to true
, the function should have returned NULL
, and, most importantly, should not have even executed line 404. It’s as if the if
disappeared. (This isn’t a good sign either.)
Memory Checkers & Sanitizers
My next step was to try Valgrind — nothing useful. I also tried address and undefined behavior sanitizers — also nothing useful. Now what?
Older clang Versions
This test-case didn’t used to crash using clang
. (The problem is that, while I do run my full suite of unit tests using gcc
every time I change the source code, I don’t also do it every time using clang
.) So I started downloading older and older versions of clang
and trying them out. I finally found out that the crash didn’t happen using clang
13.0.1 and started happening using clang 14.0.0. Although highly unlikely in a program as mature as clang
, it is possible that I found an optimizer bug.
Viewing the Object Code
So then I thought to look at the generated assembly code using objdump
(line split for clarity):
$ objdump -gSMintel --no-show-raw-insn --symbolize-operands \
c_ast_util.o > c_ast_util-13.0.1.s
Looking at the output for c_ast_is_ptr_to_tid_any()
, I discovered that the compiler inlined the call to c_ast_unpointer_qual()
, but that the if
was there:
; if ( ast->kind != K_POINTER )
ec6: cmp eax, 256
ecb: jne <L3>
I then recompiled using clang
14.0.0 and dumped c_ast_util.o
again: this time, that if
was not there! The likelihood that I found an optimizer bug increased.
Un-Inlining
The next thing I tried was annotating the function with __attribute__((noinline))
to force clang
not to inline the function. That worked! The likelihood that I found an optimizer bug increased again.
Static Analysis
I had actually started to file a bug against clang
when I stumbled across scan-build
and tried it. It flagged undefined behavior in c_ast_is_ptr_to_tid_any()
(code repeated here):
c_ast_t const* c_ast_is_ptr_to_tid_any( c_ast_t const *ast,
c_tid_t tids ) {
c_tid_t qual_stids;
ast = c_ast_unpointer_qual( ast, &qual_stids );
return c_ast_is_tid_any_qual_impl( ast, tids, qual_stids );
}
Specifically, qual_stids
can be passed uninitialized as an argument to c_ast_is_tid_any_qual_impl()
since c_ast_unpointer_qual()
doesn’t always set it. However, I didn’t think it mattered because:
- The only way
qual_stids
remains uninitialized is ifc_ast_unpointer_qual()
returnsNULL
; and: - The code for
c_ast_is_tid_any_qual_impl()
(below) does nothing ifast
isNULL
— specifically, it does not use (the uninitialized)qual_stids
.
static c_ast_t const* c_ast_is_tid_any_qual_impl( c_ast_t const *ast,
c_tid_t tids,
c_tid_t qual_stids ) {
if ( ast != NULL ) {
c_tid_t ast_tids = c_type_get_tid( &ast->type, tids );
ast_tids = c_tid_normalize( ast_tids );
if ( c_tid_tpid( tids ) == C_TPID_STORE )
ast_tids |= qual_stids;
if ( c_tid_is_any( ast_tids, tids ) )
return ast;
}
return NULL;
}
So what harm can there possibly be that qual_stids
is uninitialized? In fact, since it’s not used in the NULL
case, how is that even undefined behavior? The problem is merely passing qual_stids
to a function means it has to be read and, according to the C11 standard §6.3.2.1¶2, is undefined behavior:
If the lvalue designates an object of automatic storage duration ... and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.
And with undefined behavior, anything is possible. But does it actually matter in this case?
The Fix
Assuming the undefined behavior is responsible for the if
disappearing, the fix is trivial: initialize qual_stids
:
c_ast_t const* c_ast_is_ptr_to_tid_any( c_ast_t const *ast,
c_tid_t tids ) {
c_tid_t qual_stids = TS_NONE; // <-- INITIALIZED NOW
ast = c_ast_unpointer_qual( ast, &qual_stids );
return c_ast_is_tid_any_qual_impl( ast, tids, qual_stids );
}
Compiling with clang
14.0.0 and running resulted in cdecl
not crashing and the test passing!
Conclusion
Bugs as a result of undefined behavior can be bizarre and make seemingly impossible things happen. Try every debugging tool at your disposal:
- Valgrind — with the caveat that it works only on generated machine code, not optimized-out code.
- clang’s address and undefined behavior sanitizers.
- clang’s
scan-build
.
If something apparently impossible is happening, look at the generated assembly code. For example, when single-stepping in a debugger, if the next line of code you step to isn’t where you think it should be, it’s a hint that the compiler did something unexpected and the assembly code isn’t what you think it is.
Epilogue
It’s still not clear to me how merely reading but not otherwise using an uninitialized variable resulted in clang
eliminating the relevant if
. But I can’t file a bug against clang
if my code has undefined behavior since it then becomes permissible to do anything.
Why didn’t the undefined behavior sanitizer flag the bug? My guess is that the undefined behavior is static in nature (thus altered the generated code) whereas the sanitizer checks a running program for dynamic undefined behavior, e.g., integer overflow.
Top comments (0)