Paul J. Lucas

Posted on Jun 20, 2022 • Edited on Jun 29, 2023

The Curious Case of the Disappearing “if”

#c #cpp #debugging

Introduction

I recently discovered a test-case that crashed cdecl:

c++decl> explain int operator new(size_t)
Assertion failed: (ast != NULL), function c_ast_unpointer_qual, file c_ast_util.c, line 404.
zsh: abort (core dumped)  c++decl

FYI, it’s a negative test: operator new must return void*, not int.

Briefly, cdecl is a program that helps you compose and decipher C (or C++) declarations or casts. Its parser parses C (or C++) declarations and constructs an abstract syntax tree (AST) that is then traversed to check for semantic errors.

The crash is strange for a couple of reasons that I knew initially. The test:

Crashes only when cdecl is compiled using clang 14.0.5.
Does not crash when compiled with various versions of gcc on any platform.

(I’d come to discover that it’s even stranger later.)

The Code

Here’s the code of the function containing line 404:

static c_ast_t const* c_ast_unpointer_qual( c_ast_t const *ast,
                                            c_tid_t *qual_stids ) {
  ast = c_ast_untypedef( ast );
  if ( ast->kind != K_POINTER )
    return NULL;

  ast = ast->as.ptr_ref.to_ast;
  assert( ast != NULL );         // <-- Line 404: THIS FAILS
  assert( qual_stids != NULL );
  *qual_stids = ast->type.stids & TS_ANY_QUALIFIER;
  // ...
  return c_ast_untypedef( ast );
}

This function takes pointer to an AST node and, if it represents a pointer in a C declaration, “un-pointers” it to obtain the pointed-to AST node. It also gets the qualifiers, e.g., const, of the pointed-to AST node.

Specifically, the first assert() fails which means ast is NULL which means as.ptr_ref.to_ast is NULL. That means to_ast wasn’t set when the AST was being constructed in the parser — or so I thought.

Using a Debugger

The first thing I did was try to use a debugger and set a breakpoint a few lines up. Unfortunately, the line I wanted to set the breakpoint on was eliminated by the compiler. So I set a breakpoint nearby. Unfortunately again, the value of ast had been “optimized out” by the compiler.

So I recompiled with -O0 — but then the program didn’t crash. A Heisenbug! (This isn’t a good sign.) So I set a breakpoint in the function that calls c_ast_unpointer_qual() (which I learned from the backtrace from the core dump):

c_ast_t const* c_ast_is_ptr_to_tid_any( c_ast_t const *ast,
                                        c_tid_t tids ) {
  c_tid_t qual_stids;
  ast = c_ast_unpointer_qual( ast, &qual_stids );
  return c_ast_is_tid_any_qual_impl( ast, tids, qual_stids );
}

This function takes a pointer to an AST node and returns the pointed-to AST node, but only if the first represents a pointer and it’s a pointer to any one of a set of types, e.g., void*.

This time, the debugger stopped at the breakpoint and I discovered that the AST node represents the built-in type (K_BUILTIN) of int (the declared return type of operator new) and in particular it’s not K_POINTER. That means this if:

  if ( ast->kind != K_POINTER )
    return NULL;

should have evaluated to true, the function should have returned NULL, and, most importantly, should not have even executed line 404. It’s as if the if disappeared. (This isn’t a good sign either.)

Memory Checkers & Sanitizers

My next step was to try Valgrind — nothing useful. I also tried address and undefined behavior sanitizers — also nothing useful. Now what?

Older clang Versions

This test-case didn’t used to crash using clang. (The problem is that, while I do run my full suite of unit tests using gcc every time I change the source code, I don’t also do it every time using clang.) So I started downloading older and older versions of clang and trying them out. I finally found out that the crash didn’t happen using clang 13.0.1 and started happening using clang 14.0.0. Although highly unlikely in a program as mature as clang, it is possible that I found an optimizer bug.

Viewing the Object Code

So then I thought to look at the generated assembly code using objdump (line split for clarity):

$ objdump -gSMintel --no-show-raw-insn --symbolize-operands \
  c_ast_util.o > c_ast_util-13.0.1.s

Looking at the output for c_ast_is_ptr_to_tid_any(), I discovered that the compiler inlined the call to c_ast_unpointer_qual(), but that the if was there:

;     if ( ast->kind != K_POINTER )
     ec6:       cmp     eax, 256
     ecb:       jne      <L3>

I then recompiled using clang 14.0.0 and dumped c_ast_util.o again: this time, that if was not there! The likelihood that I found an optimizer bug increased.

Un-Inlining

The next thing I tried was annotating the function with __attribute__((noinline)) to force clang not to inline the function. That worked! The likelihood that I found an optimizer bug increased again.

Static Analysis

I had actually started to file a bug against clang when I stumbled across scan-build and tried it. It flagged undefined behavior in c_ast_is_ptr_to_tid_any() (code repeated here):

c_ast_t const* c_ast_is_ptr_to_tid_any( c_ast_t const *ast,
                                        c_tid_t tids ) {
  c_tid_t qual_stids;
  ast = c_ast_unpointer_qual( ast, &qual_stids );
  return c_ast_is_tid_any_qual_impl( ast, tids, qual_stids );
}

Specifically, qual_stids can be passed uninitialized as an argument to c_ast_is_tid_any_qual_impl() since c_ast_unpointer_qual() doesn’t always set it. However, I didn’t think it mattered because:

The only way qual_stids remains uninitialized is if c_ast_unpointer_qual() returns NULL; and:
The code for c_ast_is_tid_any_qual_impl() (below) does nothing if ast is NULL — specifically, it does not use (the uninitialized) qual_stids.

static c_ast_t const* c_ast_is_tid_any_qual_impl( c_ast_t const *ast,
                                                  c_tid_t tids,
                                                  c_tid_t qual_stids ) {
  if ( ast != NULL ) {
    c_tid_t ast_tids = c_type_get_tid( &ast->type, tids );
    ast_tids = c_tid_normalize( ast_tids );
    if ( c_tid_tpid( tids ) == C_TPID_STORE )
      ast_tids |= qual_stids;
    if ( c_tid_is_any( ast_tids, tids ) )
      return ast;
  }
  return NULL;
}

So what harm can there possibly be that qual_stids is uninitialized? In fact, since it’s not used in the NULL case, how is that even undefined behavior? The problem is merely passing qual_stids to a function means it has to be read and, according to the C11 standard §6.3.2.1¶2, is undefined behavior:

If the lvalue designates an object of automatic storage duration ... and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

And with undefined behavior, anything is possible. But does it actually matter in this case?

The Fix

Assuming the undefined behavior is responsible for the if disappearing, the fix is trivial: initialize qual_stids:

c_ast_t const* c_ast_is_ptr_to_tid_any( c_ast_t const *ast,
                                        c_tid_t tids ) {
  c_tid_t qual_stids = TS_NONE;  // <-- INITIALIZED NOW
  ast = c_ast_unpointer_qual( ast, &qual_stids );
  return c_ast_is_tid_any_qual_impl( ast, tids, qual_stids );
}

Compiling with clang 14.0.0 and running resulted in cdecl not crashing and the test passing!

Conclusion

Bugs as a result of undefined behavior can be bizarre and make seemingly impossible things happen. Try every debugging tool at your disposal:

Valgrind — with the caveat that it works only on generated machine code, not optimized-out code.
clang’s address and undefined behavior sanitizers.
clang’s scan-build.

If something apparently impossible is happening, look at the generated assembly code. For example, when single-stepping in a debugger, if the next line of code you step to isn’t where you think it should be, it’s a hint that the compiler did something unexpected and the assembly code isn’t what you think it is.

Epilogue

It’s still not clear to me how merely reading but not otherwise using an uninitialized variable resulted in clang eliminating the relevant if. But I can’t file a bug against clang if my code has undefined behavior since it then becomes permissible to do anything.

Why didn’t the undefined behavior sanitizer flag the bug? My guess is that the undefined behavior is static in nature (thus altered the generated code) whereas the sanitizer checks a running program for dynamic undefined behavior, e.g., integer overflow.

DEV Community