This post is part 4 of the series “Learning Functional Programming using JavaScript”. When the next part will be published, a link to part 5 will be added here in the top.
This is a continuation of the Functional Programming challenge I gave myself: Write a JSON parser in JavaScript, usingthe parser combinator principles from the blogpost of Scott Wlaschin.
So far we defined the challenge, added some rules to adhere, created a character parser and we are able to parse the first literals of JSON: null
, true
andfalse
.
Quoted string parser
Its time for the next type to parse: Strings.
An EBNF notation of a JSON String looks like this:
1
2
3
4
5
6
string = '"',
{ ? Any unicode character except " or \ or control character ?
| "\", (
'"' | "\" | "/" | "b" | "f" | "n" | "r" | "t" |
"u", 4 * ? hexadeximal digit ?
) }, '"';
Since our example data does not have \uXXXX
unicode in it, I will skip the implementation of it.
Let us start with parsing the quote:
1
2
const quoteParser = characterParser('"');
console.log(quoteParser('"test"')); // ['"', [ 't', 'e', 's', 't', '"'] ]
The string could contain any character, so having to define them all using thecharacterParser
would be madness. Time to change the strategy here, by changing the characterParser
into a satisfy
method, that parses / consumes the head of our stream when it satisfies a predicate.
What is a predicate?
A Predicate is a function that will simply return true
or false
based on the input. It is commonly used in [].filter()
, [].some()
and [].every()
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
const parseError = (message) => (error) => [FAILED, message, error];
const genericParseError = parseError("Error parsing:");
const satisfy = (predicate) => ([head, ...tail]) =>
head
? predicate(head)
? [head, tail]
: genericParseError(`Unexpected '${head}'`)
: genericParseError("Unexpected EOF");
const addLabel = (label) => (parser) => (stream) =>
onFailure(parser(stream))(([, , error]) =>
parseError(`Error parsing '${label}':`)(error)
);
const characterParser = (character) =>
addLabel(character)(satisfy((c) => c === character));
Since we now use a predicate that needs to return a true
or false
to determine if the head
of the data must be consumed, it is no longer possible to say which character we were trying to parse. I updated the error message therefor to a real generic one: “Error parsing:”.
After adjusting the addLabel
as well, the characterParser
would still stay compatible. And now we can create a new parser. The stringCharParser
that would ‘satisfy’ on each character except the double quote "
.
1
2
3
4
5
6
const stringCharParser = satisfy((c) => c !== '"');
const quotedStringParser = andThenRight(quoteParser)(stringCharParser);
console.log(quotedStringParser('"test"')); // ['t', [ 'e', 's', 't', '"'] ]
console.log(quotedStringParser('""')); // [Symbol(Failed), 'Error parsing:', `Unexpected '"'`]
We could repeat a parser until it fails, to parse all the characters in the string. If it fails, we will return an empty result, to allow empty strings, and create the repeat process using recursion:
1
2
const many = (parser) => (stream) =>
orElse(andThen(parser)(many(parser)))(resultParser([]))(stream);
The parser will try to apply itself, and on success call itself to try the next character. We are breaking one of the rules of the challenge now: Recursion to own function. For now, I just want to focus on completing the JSON parser, so I will park this challenge for later.
We are able to parse the whole (test) string now:
1
2
3
const quotedStringParser = andThenRight(quoteParser)(many(stringCharParser));
console.log(quotedStringParser('"test"')); // [[ 't', [ 'e', [Array] ] ], ['"'] ]
now only parsing the right quote, by adding a andThenLeft
parser:
1
2
3
4
5
6
7
8
const andThenLeft = (parserA) => (parserB) =>
mapResult((result) => result[0])(andThen(parserA)(parserB));
const quotedStringParser = andThenLeft(
andThenRight(quoteParser)(many(stringCharParser))
)(quoteParser);
console.log(quotedStringParser('"test"')); // [[ 't', [ 'e', [Array] ] ], [] ]
The whole string is parsed, but the result is not a nice string as hoped. It has a nesting structure, caused by the andThen
construct.
1
["t", ["e", ["s", "t"]]];
Lets change that, using the earlier built mapResult
:
1
const toList = mapResult((result) => [result[0]].concat(result[1]));
And now wrap it around the andThen
call in the many
parser, to keep flattening the result:
1
2
3
4
const many = (parser) => (stream) =>
orElse(toList(andThen(parser)(many(parser))))(resultParser([]))(stream);
console.log(quotedStringParser('"test"')); // [[ 't', 'e', 's', 't'], [] ]
Much better! Now add another mapResult
in the quotedStringParser
to join all the characters:
1
2
3
4
5
6
7
8
const toString = mapResult((result) => result.join(""));
const quotedStringParser = andThenLeft(
andThenRight(quoteParser)(toString(many(stringCharParser)))
)(quoteParser);
console.log(quotedStringParser('"test"')); // ['test', [] ]
console.log(quotedStringParser('""')); // ['', [] ]
Simple strings can now be parsed, time for some refactoring:
1
2
3
4
5
6
const between = (parserLeft) => (parserMiddle) => (parserRight) =>
andThenLeft(andThenRight(parserLeft)(parserMiddle))(parserRight);
const quotedStringParser = between(quoteParser)(
toString(many(stringCharParser))
)(quoteParser);
The currying of between
seems logical, because it semantically looks like a string parser between quote parsers. But I found out it is more logical to have the argument that could change the most to be at the end. This way you can profit the most from “partially applied functions”.
1
2
3
4
5
6
const between = (parserLeft) => (parserRight) => (parserMiddle) =>
andThenLeft(andThenRight(parserLeft)(parserMiddle))(parserRight);
const quotedStringParser = between(quoteParser)(quoteParser)(
toString(many(stringCharParser))
);
Lets say we would also like to create a quoted boolean parser, we could do it more easily when the middle section would be its last argument:
1
2
3
4
const quotedParser = between(quoteParser)(quoteParser);
const quotedStringParser = quotedParser(toString(many(stringCharParser)));
const quotedBooleanParser = quotedParser(boolParser);
Lets continue with the escape characters of our string:
1
2
console.log(quotedStringParser('"test\\nmultiple \\"lines\\""'));
// ['test\\nmultiple \\', [ 'l', 'i', 'n', 'e', 's', '\\', '"', '"'] ]
We could start by listing all special characters we have, and what result they should produce. Then create a choice
parser, that is a orElse
for a list of parsers. (Just like chain
is for the andThen
)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
const quoteParser = characterParser('"');
const stringCharParser = satisfy((c) => c !== '"' && c !== "\\");
const specialCharacters = [
['\\"', '"'],
["\\\\", "\\"],
["\\/", "/"],
["\\b", "\b"],
["\\f", "\f"],
["\\n", "\n"],
["\\r", "\r"],
["\\t", "\t"],
];
const choice = (parsers) => parsers.reduce((a, b) => orElse(a)(b));
const escapedCharParser = choice(
specialCharacters.map(([match, result]) => parseStringResult(match)(result))
);
const quotedStringParser = between(quoteParser)(quoteParser)(
toString(many(orElse(stringCharParser)(escapedCharParser)))
);
console.log(quotedStringParser('"test\\nmultiple \\"lines\\""'));
// ['test\nmultiple "lines"', [] ]
So the string was in there as well now. I really like how the chain
andchoice
are alike:
1
2
const chain = (parsers) => parsers.reduce((a, b) => andThen(a)(b));
const choice = (parsers) => parsers.reduce((a, b) => orElse(a)(b));
What I did not like however, was that the function passed into the .reduce
had to accept 2 parameters. Time to force us into another direction, by adding yet another selector to our no-restricted-syntax
eslint rule, in ourpackage.json
:
1
2
3
4
5
6
7
8
9
10
11
12
13
{
"eslintConfig": {
"rules": {
"no-restricted-syntax": [
"error",
{
"selector": "ArrowFunctionExpression[params.length > 1]",
"message": "Only 1 argument allowed. Please use currying"
}
]
}
}
}
The following lines are now giving errors:
1
2
const chain = (parsers) => parsers.reduce((a, b) => andThen(a)(b));
const choice = (parsers) => parsers.reduce((a, b) => orElse(a)(b));
So we build our own, recursive reducer:
1
2
3
4
5
6
7
const doReduce = (reducer) => (start) => ([head, ...tail]) =>
head ? doReduce(reducer)(reducer(start)(head))(tail) : start;
const reduce = (reducer) => ([head, ...tail]) => doReduce(reducer)(head)(tail);
console.log(doReduce((a) => (b) => a + b)(1)([2, 3, 4])); // 10
console.log(reduce((a) => (b) => a + b)([1, 2, 3, 4])); // 10
The doReduce
function will accept a reducer, a start value and a list. The reducer should have the signature acc => elem => {}
. The reduce
function is a convenient method to use the first value of the list as initial accumulated value.
Having our new reducer function, we are now actually able to write:
1
2
const chain = (list) => reduce((a) => (b) => andThen(a)(b))(list);
const choice = (list) => reduce((a) => (b) => orElse(a)(b))(list);
Which is actually the same as:
1
2
const chain = reduce(andThen);
const choice = reduce(orElse);
🤯
The Number parser
An EBNF notation of a JSON Number looks like this:
1
2
3
4
number = ["-"],
( "0" | ? digit 1-9 ?, { ? digit ? } ),
[".", ? digit ?, { ? digit ? }],
[( "e" | "E" ), [ "+" | "-"], ? digit ?, { ? digit ? } ];
Lets idenfity the elements:
- An optional sign “-“
- An integer part
- An optional fractional part
- An optional exponent part
A number has a lot of optional parts, and a lot of ‘choices’ within them.
Time to introduce new parser tools for them:
- AnyOf. Each character in the string is becoming an allowed character. This will help us define ‘digits’
- Opt. If parsing fails, succeed anyway. (Thus making the first parse optional)
1
2
const anyOf = (string) => choice([...string].map(characterParser));
const opt = (parser) => orElse(parser)(resultParser([]));
We can now create the sign parser:
1
const optSign = opt(characterParser("-"));
And the integer parser:
1
2
3
4
5
6
const zero = characterParser("0");
const digitOneNine = anyOf("123456789");
const digit = anyOf("0123456789");
const nonZeroInt = toString(andThen(digitOneNine)(toString(many(digit))));
const intPart = orElse(nonZeroInt)(zero);
We use the toString to concat the results together. And we will use chain
to add our individual pieces into a single parser:
1
2
3
4
5
6
7
const numberParser = chain([optSign, intPart]);
console.log(numberParser("-123.45e6"));
// [[ '-', '123'], ['.', '4', '5', 'e', '6'] ]
console.log(numberParser("123.45e-6"));
// [[ [], '123' ], ['.', '4', '5', 'e', '-', '6'] ]
Now the fraction part:
1
2
3
4
5
6
7
8
9
10
const some = (parser) => toList(andThen(parser)(many(parser)));
const point = characterParser(".");
const fractionPart = toString(andThen(point)(toString(some(digit))));
const numberParser = chain([optSign, intPart, opt(fractionPart)]);
console.log(numberParser("-123.45e6"));
// [[ [ '-', '123'], '.45' ], ['e', '6'] ]
console.log(numberParser("123.45e-6"));
// [[ [ [], '123' ], '.45' ], ['e', '-', '6'] ]
Now only the exponent part:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
const e = anyOf("eE");
const optPlusMinus = opt(anyOf("+-"));
const exponentPart = toString(chain([e, optPlusMinus, toString(some(digit))]));
const numberParser = chain([
optSign,
intPart,
opt(fractionPart),
opt(exponentPart),
]);
console.log(numberParser("-123.45e6"));
// [[ [ [Array], '.45' ], 'e,6' ], [] ]
console.log(numberParser("123.45e-6"));
// [[ [ [Array], '.45' ], 'e,-6' ], [] ]
The exponent part is not quite what we expected. It has returned e,6
instead of e6
. This has to do with the toString
over the chain
function. The rest of the result is also acting like a russion doll. We need to flatten te result during the chain process, to prevent the nested andThen
constructs to produce this nested result.
1
2
3
4
5
6
7
8
9
10
11
const chain = reduce((parserA) => (parserB) =>
mapResult(([resultA, resultB]) => [...resultA, resultB])(
andThen(parserA)(parserB)
)
);
console.log(numberParser("-123.45e6"));
// [[ '-', '123', '.45', 'e6'], [] ]
console.log(numberParser("123.45e-6"));
// [[ '123', '.45', 'e-6'], [] ]
Much better! now we should convert it to an actual JavaScript number. To make it easy for use, we will use the Number()
to typecast our parsed string to a number. The final number parser now looks like this:
1
2
3
4
5
6
7
8
9
10
11
const numberParser = addLabel("number")(
mapResult((result) => Number(result))(
toString(chain([optSign, intPart, opt(fractionPart), opt(exponentPart)]))
)
);
console.log(numberParser("-123.45e6"));
// [-123450000, [] ]
console.log(numberParser("123.45e-6"));
// [0.00012345, [] ]
This is one seemed quite easy to build, even if a number has a lot of seperate parts.
The implementation so far:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
const FAILED = Symbol("Failed");
const PARSED = 0;
const REMAINING = 1;
const parseError = (message) => (error) => [FAILED, message, error];
const genericParseError = parseError("Error parsing:");
const satisfy = (predicate) => ([head, ...tail]) =>
head
? predicate(head)
? [head, tail]
: genericParseError(`Unexpected '${head}'`)
: genericParseError("Unexpected EOF");
const onSuccess = (result) => (next) =>
result[PARSED] !== FAILED ? next(result) : result;
const onFailure = (result) => (next) =>
result[PARSED] === FAILED ? next(result) : result;
const addLabel = (label) => (parser) => (stream) =>
onFailure(parser(stream))(([, , error]) =>
parseError(`Error parsing '${label}':`)(error)
);
const characterParser = (character) =>
addLabel(character)(satisfy((c) => c === character));
const combineResult = (resultA) => (resultB) =>
onSuccess(resultB)(() => [
[resultA[PARSED], resultB[PARSED]],
resultB[REMAINING],
]);
const andThen = (parserA) => (parserB) => (stream) =>
onSuccess(parserA(stream))((result) =>
combineResult(result)(parserB(result[REMAINING]))
);
const orElse = (parserA) => (parserB) => (stream) =>
onFailure(parserA(stream))(() => parserB(stream));
const resultParser = (result) => (stream) => [result, stream];
const mapResult = (transform) => (parser) => (stream) =>
onSuccess(parser(stream))((result) => [
transform(result[PARSED]),
result[REMAINING],
]);
const andThenLeft = (parserA) => (parserB) =>
mapResult((result) => result[0])(andThen(parserA)(parserB));
const andThenRight = (parserA) => (parserB) =>
mapResult((result) => result[1])(andThen(parserA)(parserB));
const between = (parserLeft) => (parserRight) => (parserMiddle) =>
andThenLeft(andThenRight(parserLeft)(parserMiddle))(parserRight);
const doReduce = (reducer) => (start) => ([head, ...tail]) =>
head ? doReduce(reducer)(reducer(start)(head))(tail) : start;
const reduce = (reducer) => ([head, ...tail]) => doReduce(reducer)(head)(tail);
const chain = reduce((parserA) => (parserB) =>
mapResult(([resultA, resultB]) => [...resultA, resultB])(
andThen(parserA)(parserB)
)
);
const choice = reduce(orElse);
const toList = mapResult((result) => [result[0]].concat(result[1]));
const toString = mapResult((result) => result.join(""));
const many = (parser) => (stream) =>
orElse(toList(andThen(parser)(many(parser))))(resultParser([]))(stream);
const stringParser = (string) =>
addLabel(string)(chain([...string].map((char) => characterParser(char))));
const parseStringResult = (string) => (result) =>
andThenRight(stringParser(string))(resultParser(result));
const nullParser = parseStringResult("null")(null);
const boolParser = orElse(parseStringResult("true")(true))(
parseStringResult("false")(false)
);
const quoteParser = characterParser('"');
const stringCharParser = satisfy((c) => c !== '"' && c !== "\\");
const specialCharacters = [
['\\"', '"'],
["\\\\", "\\"],
["\\/", "/"],
["\\b", "\b"],
["\\f", "\f"],
["\\n", "\n"],
["\\r", "\r"],
["\\t", "\t"],
];
const escapedCharParser = choice(
specialCharacters.map(([match, result]) => parseStringResult(match)(result))
);
const quotedStringParser = addLabel("string")(
between(quoteParser)(quoteParser)(
toString(many(orElse(stringCharParser)(escapedCharParser)))
)
);
const anyOf = (string) => choice([...string].map(characterParser));
const opt = (parser) => orElse(parser)(resultParser([]));
const some = (parser) => toList(andThen(parser)(many(parser)));
const optSign = opt(characterParser("-"));
const digitOneNine = anyOf("123456789");
const digit = anyOf("0123456789");
const point = characterParser(".");
const optPlusMinus = opt(anyOf("+-"));
const e = anyOf("eE");
const zero = characterParser("0");
const nonZeroInt = toString(andThen(digitOneNine)(toString(many(digit))));
const intPart = orElse(nonZeroInt)(zero);
const fractionPart = toString(andThen(point)(toString(some(digit))));
const exponentPart = toString(chain([e, optPlusMinus, toString(some(digit))]));
const numberParser = addLabel("number")(
mapResult((result) => Number(result))(
toString(chain([optSign, intPart, opt(fractionPart), opt(exponentPart)]))
)
);
Conclusion
The code so far:
- We changed our basic
characterParser
to a more genericsatisfy
function, and build a newcharacterParser
upon that. - We added more tools to the parsing family:
-
some
(one or more) -
many
(zero or more) -
opt
(zero or one) -
anyOf
(choice of character parsers)
-
- We can now parse all ‘simple’ types in JSON.
null
,true
,false
,string
andnumber
. - We refactored reduce into a curried function
- Our parsers now produce a result based on the dynamic input, and of the proper type.
- Check the full code here
What I learned so far:
- Changing earlier basic functions around is quite easy
- We are mainly building ‘tools’ to make a parser, instead of building the actual parser. The JSON specific part is still really small
What did I like:
- When building it you really feel a gain in speed. We are working with bigger and bigger parts. It feels like ages when we first started with parsing a single character
- The structure of the JSON part implementation is quite on-par with the EBNF definition
- Changing the reduce into a curried version was really nice. If actually benefitted readability when being able to write:
reduce(orElse)
- It starts to feel more and more normal to only have a single argument for a function
What did I not like:
- We should also update the
.map
and.join
to curried versions
Next time:
- Enforcing the single argument rule
- Parsing arrays of values and objects
Top comments (0)