DEV Community

Cover image for Optimizing Chained strcmp Calls for Speed and Clarity
Yair Lenga
Yair Lenga

Posted on • Originally published at Medium

Optimizing Chained strcmp Calls for Speed and Clarity

From memcmp and bloom filters to 4CC encoding for small fixed-length string comparisons

I've been working on an article to describe a small performance issues with a pattern I've seen multiple times - long chain of if statements based on strcmp. This is the equivalent of switch/case on string value (which is not supported in C).

The code implemented critical business logic, based on currency codes (ISO 3-letter code), and an as-of date. Similar to:

bool model_ccy_lookup(const char *s, int asof, struct model_param *param)
{
    // Major Currencies
    if ( strcmp(s, "USD") == 0 || strcmp(s, "EUR") == 0 || ...) {
        ...
    // Asia-Core
    } else if ( strcmp(s, "CNY") == 0 || strcmp(s, "HKD") == 0 || ... ) {
        ...
    } else if ( ... ) {
        ...
    } else {
        ...
    }
} 
Enter fullscreen mode Exit fullscreen mode

The code couldn’t be refactored into a different structure (for non-technical reasons), so I had to explore few approaches to keep the existing structure - without rewrite/reshape of the logic. I tried few tings - like memcmp, small filters, and eventually packing the strings into 32-bit values (“4CC”-style) and letting the compiler work with integer compares.

I was able to revise the code into cleaner form, which outperform the original code by 10X. This was achieved with a series of experiments, some of them helped, some of them did not yield improvements, but hinted at other directions.


Full Article: Medium (No Paywall):

Optimizing Chained strcmpCalls for Speed and Clarity


The final implementation looks like:

bool model_ccy_lookup(const char *s, int asof, struct model_param *param)
{
    // Major Currencies
    if ( CCY_IN(s, "USD", "EUR", ...) ) {
        ...
    // Asia-Core
    } else if ( CCY_IN(s, "CNY", "HKD", ...) ) {
        ...
    } else if ( ... ) {
        ...
    } else {
        ...
    }
} 
Enter fullscreen mode Exit fullscreen mode

And the CCY_IN was implemented as a series of integer compare, using the FourCC encoding = replacing each fixed-size strcmp with a call to CCY_EQ macro:

#define CCY_EQ(x, ccy) (*(int *)x == *(int*) ccy )
Enter fullscreen mode Exit fullscreen mode

I’m also trying a slightly different writing style than usual - a bit more narrative, focusing on the path (including the dead ends), not just the final result.

If you have a few minutes, I’d really appreciate feedback (comment here or on Medium) on two things:

  • Does the technical content hold up?
  • Is the presentation clear, or does it feel too long / indirect?

Interested to hear on other ideas/approach for this problem as well.

Top comments (0)