DEV Community

Albert Wu
Albert Wu

Posted on • Originally published at albertywu.com on

An even simpler javascript tokenizer

What’s the easiest way you know of to tokenize an arithmetic expression in javascript? Let’s say you’re building a calculator application, and want this to happen:

console.log(
 tokenize('100-(5.4 + 2/3)*5')
)
// ['100', '-', '(', '5.4', '+', '2/3', ')', '*', '5']
Enter fullscreen mode Exit fullscreen mode

Before you reach into your npm module bag-o-tricks, realize that this can be done in one line of javascript using a secret feature of the string split method. Behold:

'100-(5.4+2/3)*5'
  .split(/(-|\+|\/|\*|\(|\))/)
  .map(s => s.trim())
  .filter(s => s !== '')
// ['100', '-', '(', '5.4', '+', '2/3', ')', '*', '5']
Enter fullscreen mode Exit fullscreen mode

Excuse me? What’s that hot mess inside the split function? Let’s break it down step by step using a few examples of increasing complexity:


Example 1: s.split(/-/)

Pretty obvious: this splits the string s anywhere it sees the minus sign symbol -.

'3-2-1'.split(/-/)
// ["3", "2", "1"]
Enter fullscreen mode Exit fullscreen mode

Example 2: s.split(/(-)/)

The only difference from the previous example is the enclosing parens in the regex, which creates a capturing group. Here’s the key point of the entire article: If the regular expression contains capturing parentheses around the separator, then each time the separator is matched, the results of the capturing group are spliced into the output array.

'3-2-1'.split(/(-)/)
// ["3", "-", "2", "-", "1"]
Enter fullscreen mode Exit fullscreen mode

Example 3: s.split(/(-|\+)/)

This builds off the previous example by adding support for the addition symbol \+. The backslash \ is required to escape the regex. The vertical pipe | acts as an OR statement (match - OR +).

'3-2-1+2+3'.split(/(-|\+)/)
// ["3", "-", "2", "-", "1", "+", "2", "+", "3"]
Enter fullscreen mode Exit fullscreen mode

The Final Boss (tying everything together)

Hopefully, you now have all tools needed to understand .split(/(-|\+|\/|\*|\(|\))/). Hope that made sense! Let me know in the comments if you liked this article, or ping me on twitter!

Heroku

This site is built on Heroku

Join the ranks of developers at Salesforce, Airbase, DEV, and more who deploy their mission critical applications on Heroku. Sign up today and launch your first app!

Get Started

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more