Timothy Choi

Posted on Feb 16, 2022 • Edited on Feb 1

How we sped up a Regex by 50,000x to make a family-friendly MMO safe for all

#performance #regex #python #mmorpg

Hello everyone! I'm Tim, a web / game developer at Toontown: Corporate Clash, a reimagined experience of the now defunct Disney's Toontown Online. I contribute to the game's backend service, the Corporate Clash website, and internal tooling (such as Discord Bots and web services that normal players don't see).

Recently, we spotted and refactored a heavily-used code path, executed over 130k times a day, and resulted in a ~50,000x speedup. In this article I would like to share with you how we discovered the problem with performance profiling, and how we solved the issues with the oldest tricks in the book: Doing things ahead of time, and using the right data structure for the job.

Welcome to Toontown!

If you're not familiar with Toontown Online, here's a quick introduction:

Disney's Toontown Online was a family-friendly MMORPG that launched in 2003, loosely based around the Who Framed Roger Rabbit franchise. In the game, players control virtual avatars (Toons) and work together to fight the evil Cogs, who want to turn Toontown from a silly, wacky, fun-filled utopia into yet another corporate venture. There's one weakness to those corporate robots: They can't take a joke! Toons engage in round-based battles where Gags are used as practical jokes on the Cogs, causing them to short-circuit and explode. However, there are a lot of Cogs out there, so the Toons will need to work together and use Gags strategically to ensure even the biggest, baddest Cogs can be defeated - with laughter.

Outside Cog-fighting, players can enjoy many activities together with friends, including minigames, racing, fishing, golfing, table games, and more. Toons are also given fully customizable houses in their own private Estate if the hustle and bustle of Toontown gets too tiring. They can do a bit of gardening, play with their pet Doodles, or simply chill out and chat with family and friends in a private space. In a sense, Toontown really was the metaverse before it was cool.

Playing off the idea of work vs play, Toontown Online was an instant hit, winning many awards in a span of a few years after release. Over the years, Toontown continued to see new features being added, and official servers in various languages were also opened to cater to local audiences, including German, French, Japanese, Brazilian Portuguese. It's safe to say Toontown Online made a lot of kids' childhoods.

However, as mobile games exploded in popularity and people shifted away from playing games on a PC, and its subscription-based revenue model wasn't generating enough revenue anymore. In 2013, Disney decided to shut down Toontown Online. Since then, many private servers that recreated the game from reverse-engineering the original game client have popped up. The two most popular ones are Toontown Rewritten (which strives to keep the original Toontown Online experience alive), and Toontown: Corporate Clash (which makes significant changes to the Toontown experience to make it more appealing for 2022 gamers). Corporate Clash aims to retain the core spirit of Toontown but added many new twists such as tweaked reward progression, brand new taskline, brand new boss battles (including a hardmode boss which many of our advanced players love) and more.

The Chats Were Piling Up

Around early October 2021, we started noticing some weirdness in our daemons responsible for chat handling. The CPU usages were skyrocketing, to the point where it was starving other services running on the same server of their fair share of CPU, forcing us to restart our game servers every week or else the entire game would crash. We would also receive reports from users that the entire game was being laggy, and chats weren't delivered on time. Our next content update, the v1.2.5 update, was just around the corner, and we couldn't let server sluggishness ruin the biggest update we were about to deliver.

To examine our game's performance, we looked at the performance profiles, basically a snapshot of resource usage within a program, generated by our code. We are very grateful to have support from Datadog, which provides free monitoring services for our game and website. They have a suite of profiler aggregation and analysis features which allow us to inspect our code's performance and see which bits of code are taking up the most resources.

The flamechart above, generated by Datadog, allows us to see which function took up the most time during program execution. From above, we can see the messageHiddenByBlacklist function, which determines if a message should be blocked by our chat filter, was using 2m 19s of CPU Time over 60s¹. This should never happen unless we're running very inefficient code. We need to make this function faster, but why do we need a chat filter in Toontown anyway?

We process over 130k messages per day, with peaks nearing 200k a day during major updates, so it's good to keep the time needed to process each message as quickly as possible - you wouldn't want the chat to be delayed when you're coordinating stunning strategies in an Overclocked C.L.O. Boss Battle!

To Work Together, You Must Communicate

As teamwork and socialization were the core of Toontown Online, Disney had to create a robust communication system to let players strategize on the next move, finding new buddies to play minigames with, or simply asking a friend about how their day was. However, as a family-friendly game, Toontown needed to protect minors from harassment or harm.

In Toontown Online, there were 3 main ways of communication, from the most restrictive to the least:

SpeedChat (menu-based chat)
SpeedChat+ (limited free-form text chat)
True Friends Chat (free-form text chat)

SpeedChat was the main method of communication when Toontown first came out. Players were able to select a pre-approved phrase to say from a menu, which contained common phrases like greetings, emotions and chit-chat. It also contained game-specific phrases such as tasks the player is working on, how to strategize in battles, and buddying up to do things together. As all phrases were pre-approved, it ensures players have no way to say inappropriate things in-game, therefore keeping children safe when playing the game.

However, as the choices of phrases were finite, SpeedChat made communication beyond what Disney allowed, even with good intentions, very difficult. Disney therefore also allows an open chat between players who know each other in real life. To enable this, players had to become True Friends by exchanging True Friend codes, consisting of 6 alphanumeric characters, with each other offline. After becoming True Friends, players would be able to send any chat messages they wanted into the game, which is much quicker than shoehorning your thoughts into what's available on the menu. There is a simple filter in place to stop extremely offensive messages, but anything a US keyboard can type can be in a message. This struck a balance between communication needs and moderation load, as real-life friends are less likely to harass each other than strangers would, therefore needing less human moderator intervention. Disney anticipated that since there was no way for players to exchange the alphanumeric codes within the game, it would be safe from abuse from strangers. They were very, very wrong.

Players soon found ways to exchange True Friend codes without ever having to meet up offline using the Toon Estates housing system by moving furniture around to form the code, or even simply by repeating the code with the initials of specific SpeedChat phrases. (This article goes more in-depth about how this was done in-game, as well as interesting tidbits about how SpeedChat came to be) Granted, the player would have to understand the system and be aware of the other player's intention, so it's not something an oblivious child would be able to decipher, but it highlighted a flaw in Disney's design: players want to talk to each other so badly they would jump through hoops and work around limitations just to be able to say what they truly want to say.

A new chat system was needed.

Putting the Chat in SpeedChat(+)

SpeedChat+, which arrived in Toontown Online in late 2008 (first appearing in another Disney MMO, Pirates of the Caribbean Online), bridged the gap between SpeedChat and True Friends chat, allowing much more freedom while offering protection to younger players from harassment. Players could finally type in messages with their keyboard, but they couldn't say just anything they wanted. SpeedChat+ has a chat filter, which maintains an allowlist (a list of all words you can say in the game) and a blocklist (a list of words/phrases you cannot say in the game, even if the individual words are all in the allowlist). For a message to get through, all words in a chat message must be in the allowlist, and no part of the message can match the blocklist. If any part of the message doesn't satisfy these 2 criteria, that part of the message is replaced with animal noises. Additionally, there is a maximum length of ~110 characters, so that players can't just send really long messages and crowd other people's screens.

To help the players know which words are allowed, words that aren't in the allowlist is shown in red²:

SpeedChat+ is an opt-in system: Parents could choose to enable SpeedChat+ for their children if they feel they are mature enough to handle online interactions without supervision. If SpeedChat+ is disabled for a player and other players type in a message, they would not receive the message at all.

Having a separate blocklist is important for moderation because rule-breaking players can find creative ways to evade the chat filter by combining words in the allowlist to form a sentence that phonetically or semantically carries the same meaning. For example, if we wanted to block the phrase “smelly dog" and only added that one phrase to the blocklist, players could circumvent the filter by saying “small ly dog", “small lie dog", “:s mel ly dog", “:s molly dog", or something completely different but would still get the point across, like “odor full woof". The blocklist allows the game to block the most common filter-evading phrases. It is by no means perfect: It is like playing whack-a-mole on human imagination, and some phrases which have no malicious meaning may occasionally be blocked accidentally, but it hits the sweet spot between allowing freedom of expression, keeping players safe from harm, and minimizing moderation burden.

In Corporate Clash, we continuously update our allowlist and blocklist (you can suggest words to be added to the allowlist with this form here) and to date we have more than 58,000 allowlist entries and over 8,400 blocklist entries. From the mundane of “trolley", “apple" and “cake", to the crazy of “bingus" and “supercalifragilisticexpialidocious", SpeedChat+ allows you to say a lot of things. In addition to automated moderation tools, volunteer moderators investigate in-game reports and monitor conversations where inappropriate topics are being discussed to make sure all Toons are following our in-game rules. Most of our players are rule-abiding, meaning the amount of messages that would be blocked is quite low compared to the amount of messages that pass through. (This is important, as we'll see later on)

Putting the Speed in SpeedChat+

Upon further investigation, the way the SpeedChat+ blocklist is handled is anything but speedy. It turns out we were blindly applying each of the blocklist entries over the message:



import re

BLOCKLIST = [] # ... a list of words or phrases we want to block

def naive_blocklist_scanning(message: str) -> bool:
  lowercase_message = message.lower()
   for word in BLOCKLIST:
       if re.search(r'\b{}\b'.format(word), lowercase_message):
           return True
   return False

A benchmark of scanning ~120 messages typically found in our chat logs with the above function 10 times took an average of 126.7 seconds. For a function designed to process real-time chat, this is way too expensive.

The function is highly inefficient for 3 reasons:

It is compiling all of the blocklist patterns on-the-fly, and it has to do so for all 8,400 blocklist phrases, for every single message.
It has to scan the message with all of the regex rules in the blocklist, which means message processing gets slower and slower as more blocklist entries are added.
Because it has to prove the message has nothing that is in the blocklist, there is no early-exit option available; for a simple message like “hi" which is very much allowed, the scanner would still need to compare the message against all 8,000 entries just in case any part of the message “hi" matched an item on the blocklist. With this model, we are assuming the worst out of our players' messages.

We decided to tackle #1 first.

Attempt #1: Ahead-of-time Prep

Python's regex engine is essentially a mini-virtual machine, similar to how Python first converts source code into bytecode before interpreting (running) it. To match strings on a regex pattern, it must first be compiled. You can either compile the pattern explicitly with re.compile(), or you can use one of the functions in the re module where patterns are implicitly compiled as a shortcut. This isn't a problem if you only intend to use the pattern a few times, but If a pattern is going to be used repeatedly (e.g. in a loop), it's better to explicitly compile the pattern and reuse it as compiling it is not free.

Python's re module comes with a built-in cache which by default caches a limited amount of the most recent regex patterns it encounters (as of the time of this article this limit is 512), which is good for speeding up programs who use a few regexes here and there without requiring them to explicitly compile the regex. When the regex cache hits the limit, it clears the regex cache altogether³, which means we are continually recompiling the blocklist phrases since we have way more than 512 entries in our list (not to mention uses of Regex elsewhere in the codebase). We can change the maximum cache limit directly referenced by the re module, but a cleaner way would be to explicitly compile all the patterns we need. Besides being more explicit, regexes outside of our chat filter can't accidentally trigger a cache flush.

Here, we move the compilation step outside of the loop, so we only compile the patterns once, then reuse it in a for loop.



import re

precompiled_blocklist_regex = [re.compile(r'\b{}\b'.format(pattern)) for pattern in BLOCKLIST]

def precompiled_patterns_blocklist_scanning(message: str) -> bool:
   lowercase_message = message.lower()
   for bl_phrase in precompiled_blocklist_regex:
       if bl_phrase.search(lowercase_message):
           return True
   return False

This brings the average runtime for the benchmark to 1.74 seconds, a 78.2× speed increase. We could stop there, but problems #2 and #3 still exist. As our blocklist grows, so will the run time. Ideally, the algorithm should base its run time on the length of the string, not the length of the blocklist.

For this, we need one of Computer Science's less appreciated data structures: Tries.

Attempt #2: Introducing Tries

A Trie⁴, according to Wikipedia:

is a type of search tree data structure used for locating specific keys from within a set. These keys are most often strings (but can also be other things like digits in a number), with links between nodes defined not by the entire key, but by individual characters.

In simpler terms, a Trie is a type of tree, where each node is an individual character in the string to be stored, linked by the order of the characters, with the empty string at the root. If you traverse a Trie from the roof all the way down to a leaf, noting the character in each node along the way, you get the original string back.

If we insert the four string “apple", “apply", “about", and “abound" into the Trie, we would get something like this:

Thanks to the

trie visualization

website for the images!

To store a new string, we traverse the tree from the root, following the characters in the string we want to store, one by one. If at some point we reach a node where there are no applicable children, we create a child with the missing character, traverse to that child, then repeat the process until the entire string is exhausted. For example, to add the word “able" to the example trie above:

Read the character "A". There is a path from the root node to the character a, so we navigate down to it.
Read character B. Node a has children b and p. We are only interested in b, so we navigate down to it.
Read the character "L". Node b has one child: o. We are only interested in l and there is no path for it, so we create a child l under node b, and navigate down to it.
Read the character "E". Node l has no children, so we create a child e under l, and navigate down to it.
The string has been exhausted; we've successfully added the string.

If the string is exhausted without new nodes being created, it means the string has already been added. We can use this property to check whether a phrase is in a Trie very effectively: We navigate through the tree, and stop when a node has no children matching the rest of the string. For example, if we want to find the word "ago" in the above Trie, we would first navigate through the root, which only has one child: a. None of the children of a contains the letter "g", therefore the lookup fails.

This data structure is especially useful for string searches because the time-complexity of retrieving a single string is O(N), where n is the length of the longest word in the tree, whereas the complexity of the original function was O(mn), where m is the list of words in the block list, and n is the length of the longest word of the tree. In other words, no matter how large the block list is, the running time won't increase given the same string.

Another advantage of Tries is its ability to fail quickly; for example, if we are blocking the string “Cog", an input like “Hello" would fail instantly since there is no match for the character C at the root of the tree. We would know the message is safe and can be allowed. Since we expect most of our players to only send rule-abiding messages, most of the messages should fail very quickly without the need to dig deep into the Trie, speeding up the checks for the most likely scenarios.

A small disadvantage for Tries is that they generally take up more space in storage over other types of search trees, but we are only generating one Trie to be reused for all messages, so this is not a problem for us.

We used the amazing trie-regex Python package to build the Trie from our blocklist, and turn it into a regex we can apply to our messages. With this method, we are able to reduce the 8,000 regex patterns to just a few (to cover edge cases like case sensitivity and punctuations⁵).



from trieregex import TrieRegEx

blocklistTrie = TrieRegEx()
blocklistTrie.add(*BLOCKLIST)

BLOCKLIST_PATTERN = re.compile(rf'\b{blocklistTrie.regex()}\b')


def trie_blocklist_scanning(message: str) -> bool:
   if BLOCKLIST_PATTERN.search(message.lower()) is not None:
       return True

   return False

Results: Speedy SpeedChat

Running the benchmark again with the new function, the average runtime dropped to 0.002 seconds, a ~630× speed increase, and ~50,000× over the original function.

Method	Runtime (10 reps, 10 runs)	Speedup (average)
Naive	Max: 129.5s Min: 124.2s Avg: 126.7s Stdev: 1.6772	N/A
Precompiled Regex List	Max: 1.66s Min: 1.60s Avg: 1.62s Stdev: 0.04505	78.2x
Trie-based Regex	Max: 0.002794s Min: 0.002527s Avg: 0.002591s Stdev: 0.0000820963	625.3x over precompiled regex 48918.9x over naive

After deploying the changes, the performance profiles generated were night and day. Instead of spending almost 3 minutes for every minute of CPU time, we are now only spending 2 seconds for every minute. The CPU usages have been brought under control, and we welcomed over 1,100 concurrent Toons during our v1.2.5 launch weekend without any sort of chat lag! We're very happy we're able to improve our player's chat experience without compromising our efforts to keep Toontown a safe environment for all.

Bonus attempt: One Giant Regex

Since publication, Reddit has reminded me of the golden rule in tech writing: There is always somebody that'll remind you of things you've forgotten to consider. In this case, it's simply putting the list of words together in one giant regex. We can use the | (or) operator here - for example the regex (apple|orange) will match either "apple" or "orange" (but not "appleorange" or "banana"). Since our blocklist is a list of strings, we can join the items with the | character and get a working regex really easily.

Here's what the function looks like:



regex = re.compile(r'\b({})\b'.format('|'.join(BLOCKLIST)))


def precompiled_giant_pattern_re_blocklist_scanning(message: str) -> bool:
    lowercase_message = message.lower()

    return regex.search(lowercase_message) is not None

Additionally, the pyre2 package, which exposes the re2 engine⁶, lets us speed up most regexes at no cost because it uses a different method for generating the regex virtual machine, leading to massive performance improvements for most types of regexes. For completeness, I benchmarked both the One Giant Regex and the Tries-based regex with re2 to see if it can manage to speed things up more.

Here are the results:

Method	Runtime (10 reps, 10 runs)	Speedup (average)	Runtime (10,000 reps, 100 runs)	Speedup (average)
Trie-based Regex	Max: 0.002794s Min: 0.002527s Avg: 0.002591s Stdev: 0.0000820963	625.3x over precompiled regex 48919x over naive	Max: 3.4789s Min: 2.4209s Avg: 2.5387s Stdev: 0.183774	N/A
Giant Regex (re)	Max: 0.3267s Min: 0.2720s Avg: 0.2908s Stdev: 0.0199636214	5.57x over precompiled regex 435.7x over naive	N/A	N/A
Giant Regex (re2)	Max: 0.010380s Min: 0.002232s Avg: 0.003087s Stdev: 0.0025631473	-0.13x over trie-based 524.8x over precompiled regex 41043x over naive	Max: 4.0598s Min: 2.1721s Avg: 2.6569s Stdev: 0.348931	~ -4% over trie-based
Trie-based Regex (re2)	Max: 0.008580s Min: 0.002028s Avg: 0.002722s Stdev: 0.0020588596	-0.05x over trie-based 595.2x over precompiled regex 46547 over naive	Max: 4.0516s Min: 1.9274s Avg: 2.3079s Stdev: 0.307720	~ +10% over trie-based

Trie is still winning in this case, albeit the margins are very small. Combining the two gives even better performance than either alone.

One small drawback is pyre2 requires compiling the underlying re2 C++ library. Because of the distribution compiler we use we would have had to jump through a lot more hoops to get it working on our clients so we prefer pure Python libs. That said, we don't actually ship blocklist detection to the client so we don't have similar restrictions on the server side. The moral here is benchmark solutions that seems slow and obvious - you might find new ways to optimize it down the line.

Conclusion

The lessons in this were all of the obvious but it's really helpful to say them out loud:

Things that can be precomputed ahead of time should be precomputed.
It's good to know about more niche data structures than just the typical few; you never know when you'll need them.
Profiling your code is really good at highlighting weak spots for further investigation.

I've prepared a sample of all of our tests here. While I can't publish our exact blocklist due to security reasons, I hope the examples will help you understand the logic behind the code. All timings in this article are obtained on an M1 MacBook Pro (2020, 16GB) running Python 3.9.10. No particular efforts were made to close other running applications.

Extra Reading

Thanks a lot for reading this to the end! If you want to give our game a try, whether you've played Toontown Online or not, you're welcome to get the game at https://corporateclash.net.

If you enjoy technical challenges like these, our Technical Team has open positions for game developers (and several other teams have openings as well!). All of us are volunteers, but we're all united by the passion to keep Toontown alive, and more importantly, thriving as a game. You can check out our application here: https://corporateclash.net/help/apply

We're also on Twitter if you want to follow us, and check out our Discord to chat to us about the game!

My Mastodon is @sheriffcranky@mastodon.gamedev.place!

If you see a cyan duck named Sheriff Cranky in Corporate Clash, feel free to say hi!

I'm at a loss on how to explain this time-warp - if anybody has an explanation please let me know! Or it could just be a Datadog bug. Either way, the point is this function is very slow. ↩
To prevent players from sharing personal information (such as age and phone numbers - remember when we used to do that?), Disney omitted digits and number words from the allowlist. This stopped people from saying useful things like “hit the 3rd Cog" or “I need to fight 25 more Cogs". In true Toontown fashion, there was an obvious workaround: by saying words that sound similar to the number (e.g. 1 = won, 3 = tree, 5 = hive, 8 = ate). In Corporate Clash, we allowlist numbers that show up most often during a normal player's gameplay (e.g. attack damages for certain Gags), and have special alerting in the backend for messages that look like personal information, for human moderator intervention. ↩
This behavior was changed in Python 3.7, where the oldest compiled regex gets dropped instead of the entire cache being flushed out when the 513th regex pattern comes in. It doesn't help us much here though as we continuously overwrite the cache in the for loop. ↩
The official pronunciation is "tries" (as in the 3rd person present form of “try") but I pronounce it as "trees". ↩
We have separate blocklists for case-sensitive blocks because some players use cases to escape the chat filter: for example if we want to block the word "or" but allowed the word “orange", malicious players could try to say "ORange" or “orANGE" to get the point across). ↩
re2 is based on the Thompson NFA method. Here are 3 articles going into details why regexes built using this method is faster in most cases - it's a fascinating read and you don't need to be a regex guru to understand it. ↩

Top comments (2)

Timothy Choi • Feb 16 '22 • Edited

EDIT: I'm still very confused by pyperf but I have a new set of data:

precompiled: Mean +- std dev: 177 ms +- 2 ms

giant_re: Mean +- std dev: 28.6 ms +- 0.3 ms

giant_re2: Mean +- std dev: 243 us +- 16 us

tries: Mean +- std dev: 266 us +- 2 us

tries_re2: Mean +- std dev: 219 us +- 19 us

Perhaps not surprisingly re2 outperforms Trie-based, but very pleasant to see combining the two makes things even faster!

Jon Lauridsen • Feb 18 '22

A great read!, thanks.

DEV Community