Introduction
Suppose you want to convert an integer value to its decimal string representation, e.g., 42 to "42". In C, you have to know how big to make the string buffer. Specifically, given some integral type T, you need to know how many decimal digits comprise max(T) (and min(T) for signed types). Using sizeof alone doesn’t help since that gives you the number of bytes, not the number of decimal digits.
In general, the number of decimal digits d required to represent an integer of b bits is:
d = ceil(b * log10(2))
= ceil(b * .3010299)
= (unsigned)(b * .3010299 + 1)
However, even if you implement a macro like:
#define MAX_DEC_INT_DIGITS(TYPE) \
((unsigned)(sizeof(TYPE) * CHAR_BIT * .3010299 + 1))
you can’t use an expression whose value is calculated at run-time at compile-time such as when declaring an array:
char buf[ MAX_DEC_INT_DIGITS(int) ]; // error
Actually, in C, this would work if your compiler supports variable length arrays (VLAs); but, in general, you don’t want to use VLAs. In C++, this would always be an error since C++ doesn’t support VLAs.
You might ask:
If
sizeofis a compile-time operator, why can’t the value be calculated at compile-time?
Because the compiler can’t do floating-point math; it can evaluate only constant integer expressions at compile-time. So the question is: can multiplying by .3010299 be approximated using integer math? It turns out, yes.
The Trick
The trick is to realize that 1233 / 4096 = .30102539 which is a close approximation of .3010299. Integer division by 4096 is the same as right-shifting by 12. Therefore, the macro can become:
#define MAX_DEC_INT_DIGITS(TYPE) \
((sizeof(TYPE) * CHAR_BIT * 1233) >> 12 + 1)
It’s easy to check since TYPE will only ever be one of the integer types and the number of bits will typically only be one of 8, 16, 32, or 64. If you do the math, that works out to 3, 5, 10, and 20 — which is correct. Almost.
For signed integer types, there needs to be +1 to account for the minus sign — so you need to add 1 only if TYPE is signed. We can implement an IS_SIGNED_TYPE macro like:
#define IS_SIGNED_TYPE(TYPE) !IS_UNSIGNED_TYPE(TYPE)
#define IS_UNSIGNED_TYPE(TYPE) ((TYPE)-1 > 0)
That is, if –1 cast to TYPE > 0, it means TYPE is unsigned; and the ! of that means TYPE is signed. Now the macro can be:
#define MAX_DEC_INT_DIGITS(TYPE) \
(((sizeof(TYPE) * CHAR_BIT * 1233) >> 12 + 1) \
+ IS_SIGNED_TYPE(TYPE))
and the compiler can calculate this at compile-time thus be an integer constant expression.
Conclusion
Using an integer approximation for a floating-point calculation along with some clever macros can allow you to generate constant integer expressions that can be evaluated at compile-time.
Epilogue
Alternatively, you could always just declare a big buffer, say 80 bytes, which is large enough to hold all the digits for a 256-bit integer. But:
- Where’s the fun in that?
- If you’re like me, you like some things to be just right.
Top comments (2)
It’s important to understand that this is a special case, where we’re calculating the maximum amount of memory we can use
If we take more complex tasks than just converting numbers to strings,
it becomes much harder to compute the maximum memory usage, since the required memory might be very large
And there are cases where we simply cannot determine this maximum length at compile time — even theoretically
For example, imagine we’re writing a code translator that converts code from one programming language to another
We receive a string representing source code in language A as input and produce another string — the translated code in language B — as output.
We can’t predict the maximum output length in advance, so we have to calculate it at runtime.
To avoid duplicating code, we might end up writing something like this:
Here, the loop runs twice:
In this case, we can’t even theoretically determine the length at compile time.
True, but irrelevant since my post is only about converting integers to their decimal digit string representations.