DEV Community

Paul J. Lucas
Paul J. Lucas

Posted on • Edited on

Determining the Maximum Decimal Digits at Compile-Time

Introduction

Suppose you want to convert an integer value to its decimal string representation, e.g., 42 to "42". In C, you have to know how big to make the string buffer. Specifically, given some integral type T, you need to know how many decimal digits comprise max(T) (and min(T) for signed types). Using sizeof alone doesn’t help since that gives you the number of bytes, not the number of decimal digits.

In general, the number of decimal digits d required to represent an integer of b bits is:

d = ceil(b * log10(2))
  = ceil(b * .3010299)
  = (unsigned)(b * .3010299 + 1)
Enter fullscreen mode Exit fullscreen mode

However, even if you implement a macro like:

#define MAX_DEC_INT_DIGITS(TYPE) \
  ((unsigned)(sizeof(TYPE) * CHAR_BIT * .3010299 + 1))
Enter fullscreen mode Exit fullscreen mode

you can’t use an expression whose value is calculated at run-time at compile-time such as when declaring an array:

char buf[ MAX_DEC_INT_DIGITS(int) ];  // error
Enter fullscreen mode Exit fullscreen mode

Actually, in C, this would work if your compiler supports variable length arrays (VLAs); but, in general, you don’t want to use VLAs. In C++, this would always be an error since C++ doesn’t support VLAs.

You might ask:

If sizeof is a compile-time operator, why can’t the value be calculated at compile-time?

Because the compiler can’t do floating-point math; it can evaluate only constant integer expressions at compile-time. So the question is: can multiplying by .3010299 be approximated using integer math? It turns out, yes.

The Trick

The trick is to realize that 1233 / 4096 = .30102539 which is a close approximation of .3010299. Integer division by 4096 is the same as right-shifting by 12. Therefore, the macro can become:

#define MAX_DEC_INT_DIGITS(TYPE) \
  ((sizeof(TYPE) * CHAR_BIT * 1233) >> 12 + 1)
Enter fullscreen mode Exit fullscreen mode

It’s easy to check since TYPE will only ever be one of the integer types and the number of bits will typically only be one of 8, 16, 32, or 64. If you do the math, that works out to 3, 5, 10, and 20 — which is correct. Almost.

For signed integer types, there needs to be +1 to account for the minus sign — so you need to add 1 only if TYPE is signed. We can implement an IS_SIGNED_TYPE macro like:

#define IS_SIGNED_TYPE(TYPE)    !IS_UNSIGNED_TYPE(TYPE)
#define IS_UNSIGNED_TYPE(TYPE)  ((TYPE)-1 > 0)
Enter fullscreen mode Exit fullscreen mode

That is, if –1 cast to TYPE > 0, it means TYPE is unsigned; and the ! of that means TYPE is signed. Now the macro can be:

#define MAX_DEC_INT_DIGITS(TYPE)                \
  (((sizeof(TYPE) * CHAR_BIT * 1233) >> 12 + 1) \
    + IS_SIGNED_TYPE(TYPE))
Enter fullscreen mode Exit fullscreen mode

and the compiler can calculate this at compile-time thus be an integer constant expression.

Conclusion

Using an integer approximation for a floating-point calculation along with some clever macros can allow you to generate constant integer expressions that can be evaluated at compile-time.

Epilogue

Alternatively, you could always just declare a big buffer, say 80 bytes, which is large enough to hold all the digits for a 256-bit integer. But:

  1. Where’s the fun in that?
  2. If you’re like me, you like some things to be just right.

Top comments (2)

Collapse
 
lolpopgames profile image
LolPopGames

It’s important to understand that this is a special case, where we’re calculating the maximum amount of memory we can use

If we take more complex tasks than just converting numbers to strings,
it becomes much harder to compute the maximum memory usage, since the required memory might be very large

And there are cases where we simply cannot determine this maximum length at compile time — even theoretically


For example, imagine we’re writing a code translator that converts code from one programming language to another

We receive a string representing source code in language A as input and produce another string — the translated code in language B — as output.

We can’t predict the maximum output length in advance, so we have to calculate it at runtime.

To avoid duplicating code, we might end up writing something like this:

for (uint8_t outputing = 0; outputing < 2; outputing++)
{
    for (source_p = source, result_p = result; *source_p; source_p++)
    {
        if (outputing)
        {
            *result_p = '-';
            result_p++;
        }
        else
        {
            result_size++;
        }
        // ....
    }

    result = malloc(result_size);
    if (!result)
    {
        return NULL;
    }
}
Enter fullscreen mode Exit fullscreen mode

Here, the loop runs twice:

  1. First iteration, to calculate the required length
  2. And second iteration, to actually generate the output

In this case, we can’t even theoretically determine the length at compile time.

Collapse
 
pauljlucas profile image
Paul J. Lucas • Edited

True, but irrelevant since my post is only about converting integers to their decimal digit string representations.