DEV Community

0xc0Der
0xc0Der

Posted on

Building a simple word counting parser using pari.

In this post I'll implement a simple parser that counts the number of words and lines of the input.

first we need to define what we consider as a white space.

// it's written like that because
// it'll be passed to the `char` parser later.
const whitespace = ' \\r\\n\\t\\f\\v';
Enter fullscreen mode Exit fullscreen mode

Then, define what is a word

a word is a sequence of non white space characters.

So, follows the definition of word and space parsers.

import { char, oneOrMore } from 'pari';

// ...

const wordChar = char(`[^${whitespace}]`);
const wsChar = char(`[${whitespace}]`);

const word = oneOrMore(wordChar);
const space = oneOrMore(wsChar);
Enter fullscreen mode Exit fullscreen mode

So, how do we keep count? we need to define a parser State.

import { State, ... } from 'pari';

class CounterState extends State {
    #wordsCount = 0;
    #linesCount = 0;

    // State must have a `clone` method.
    clone() {
        const state = new CounterState(
            this.input,
            this.index,
            this.status
        );

        state.#wordsCount = this.#wordsCount;
        state.#linesCount = this.#linesCount;

        return state;
    }

    get wordsCount {
        return this.#wordsCount;
    }

    get linesCount {
        return this.#linesCount;
    }

    withIncWords() {
        this.#wordsCount += 1;
        return this;
    }

    withIncLines() {
        this.#linesCount += 1;
        return this;
    }
}

// ...
Enter fullscreen mode Exit fullscreen mode

In, the space parser we need to increase the count of lines by one if we encounter a line character and increase the count of words by one at the last space (the maybe multiple consecutive spaces).

// ...

const space = oneOrMore(wsChar.ok(state => 
    state.charAt(state.index - 1) == '\n'
        ? state.withIncLines()
        : state
)).ok(state => state.withIncWords());
Enter fullscreen mode Exit fullscreen mode

In the word parser we need to handle an edge case that is the end of input.

//...

const word = oneOrMore(wordChar.ok(state =>
    state.charAt(state.index) == ''
        ? state.withIncWords().WithIncLines()
        : state
));
Enter fullscreen mode Exit fullscreen mode

Finally, we define out word counter parser and pass it a state with an input.

// ...
import { firstOf, ... } from 'pari';

const wc = oneOrMore(firstOf([word, space]));

const input = ...;

const result = wc.process(
    new CounterState(input)
);

// print word and line counts
console.log(
    result.wordsCount,
    'words',
    result.linesCount,
    'lines'
);
Enter fullscreen mode Exit fullscreen mode

Thank you for reading 😄, If you have any questions do not hesitate to leave a comment.

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up