Today I needed to format a list of words in C, and ended up writing my own list formatter, which was rather satisfying (I like fidgety string manipulation). The goal was this: given a list of words like {"apple", "banana", "mango"}, create a string like "apple, banana, and mango". At first glance, this is simple: We step through the list, append each word to the result with a trailing comma, and when we're appending the final word, prefix it with " and ".
char *format_list(int argc, const char **argv, char *output, const char *and) {
output[0] = '\0';
for (int i = 0; i != argc; ++i) {
strcat(output, argv[i]);
strcat(output, ", ");
if (argc == i + 2) {
strcat(output, and);
strcat(output, " ");
}
}
return output;
}
void print_fruit(void) {
const char * fruit[] = {"apple", "banana", "mango"};
char buffer[1024];
puts(format_list, 3, fruit, buffer, "and");
}
WARNING: This function does no bounds checking, so the output buffer needs to have enough space for the resulting string. This is bad function design, and adding the necessary checking is not hard, but would make it harder to get the point across that this code is really trivial.
However, I needed my solution to work in other languages. We could use gettext to translate the inputs and the word "and", translate the word into a locale-dependent substitute, like this:
void print_fruit(void) {
const char * fruit[3];
char buffer[1024];
fruit[0] = gettext("apple");
fruit[1] = gettext("banana");
fruit[2] = gettext("mango");
puts(format_list, 3, fruit, buffer, gettext("and"));
}
This way (with the correct .po file and libgettext) we get "Apfel, Banane, und Mango" if we're using a German locale. There's a problem with that: German does not use an oxford comma. And other languages that I don't speak don't write lists with the same rules at all. Some put the and at the start (think "apple and banana, mango"). What we need is more than just the "and" to encapsulate all those rules, without writing a separate function for each language.
Cribbing inspiration from the icu4j library, this is the function I came up with:
https://github.com/ennorehling/clibs/blob/master/format.c#L29
I'm not including the entire file, since with the bounds checking, it gets rather long, but it does the trick, and is customizable by a set of patterns:
- list of two words (English: "{0} and {1}")
- start of a list with 3 or more elements (English: "{0}, {1}")
- middle of a list with 3 or more elements (English: "{0}, {1}")
- end of a list with 3 or more elements (English: "{0}, and {1}")
If you don't use an oxford comma (I won't judge you), then that last string should be "{0} and {1}".
Incidentally, this is the first time I've ever used goto in C, and I apologize to Ed Dijkstra and all the computer science teachers who told me, repeatedly, to never do that, ever. Well, I broke the law, and I regret nothing!
Top comments (1)
Update: Turns out the goto really wasn't necessary. Also, in practical use, sometimes memory regions overlap, and memmove is superior to memcpy.