DEV Community

Cover image for JSON parser with PHP and Bison
Anton Sukhachev
Anton Sukhachev

Posted on

JSON parser with PHP and Bison

Read this post if you don't know what Bison is.

Today I'll try to parse JSON into AST and compare it with the native PHP function json_decode().
To test our parser I will use this JSON file:

test.json

{
    "fieldString": "string",
    "fieldNumber": 99,
    "fieldBoolTrue": true,
    "fieldBoolFalse": false,
    "fieldNull": null,
    "fieldEmptyArray": [],
    "fieldEmptyObject": {},
    "fieldArray": [
        "string",
        99,
        true,
        false,
        null,
        {},
        []
    ]
}
Enter fullscreen mode Exit fullscreen mode

First, we need to install PHP dependencies.

composer require --dev mrsuh/php-bison-skeleton
composer require mrsuh/tree-printer
composer require doctrine/lexer
Enter fullscreen mode Exit fullscreen mode

We will store our files like this:

.
├── /ast-parser
    ├── /bin
    │   └── parse.php # entry point to parse JSON
    ├── /lib
    │   └── parser.php # generated file
    ├── /src
    │   ├── Lexer.php
    │   └── Node.php # AST node
    └── grammar.y       
Enter fullscreen mode Exit fullscreen mode

The Node class must implement Mrsuh\Tree\NodeInterface to print AST.

src/Node.php

<?php

namespace App;

use Mrsuh\Tree\NodeInterface;

class Node implements NodeInterface
{
    private string $name;
    private string $value;
    /** @var Node[] */
    private array $children;

    public function __construct(string $name, string $value, array $children = [])
    {
        $this->name     = $name;
        $this->value    = $value;
        $this->children = $children;
    }

    public function getChildren(): array
    {
        return $this->children;
    }

    public function __toString(): string
    {
        if (!empty($this->value)) {
            return sprintf("%s: '%s'", $this->name, $this->value);
        }

        return $this->name;
    }
}
Enter fullscreen mode Exit fullscreen mode

I'll use the Doctrine lexer library. It helps to parse complex text.

src/Lexer.php

<?php

namespace App;

use Doctrine\Common\Lexer\AbstractLexer;

class Lexer extends AbstractLexer implements LexerInterface
{
...
    protected function getCatchablePatterns(): array
    {
        return [
            '\:',
            '\{',
            '\}',
            '\[',
            '\]',
            '\,',
            "\"[^\"]+\"",
            'true',
            'false',
            'null',
        ];
    }

    protected function getNonCatchablePatterns(): array
    {
        return [
            ' ',
            '\n'
        ];
    }

    protected function getType(&$value): int
    {
        if (in_array($value, [':', '{', '}', '[', ']', ','], true)) {
            return ord($value);
        }

        if (is_numeric($value)) {
            return LexerInterface::T_NUMBER;
        }

        switch (strtolower($value)) {
            case 'true':
            case 'false':
                return LexerInterface::T_BOOL;
            case 'null':
                return LexerInterface::T_NULL;
        }

        return LexerInterface::T_STRING;
    }
...
}
Enter fullscreen mode Exit fullscreen mode

For example, Lexer will translate the JSON

{
    "array": [
        "string",
        99,
        true,
        false,
        null
    ]
}
Enter fullscreen mode Exit fullscreen mode

into this:

word token
{ ASCII (123)
"array" LexerInterface::T_STRING (258)
: ASCII (58)
[ ASCII (91)
"string" LexerInterface::T_STRING (258)
, ASCII (44)
99 LexerInterface::T_NUMBER (259)
, ASCII (44)
true LexerInterface::T_BOOL (260)
, ASCII (44)
false LexerInterface::T_BOOL (260)
, ASCII (44)
null LexerInterface::T_NULL (261)
, ASCII (44)
] ASCII (93)
} ASCII (125)
LexerInterface::YYEOF (0)

Time to create grammar.y file and build lib/parser.php

PHP already has the native function json_decode() and it uses Bison to generate a C parser.
I think we can get ready Bison grammar file from the php-src repository and modify it.
The grammar file is very small because JSON standard is very simple.

We will use block %code parser to define variables and methods to store AST into the Parser class.

grammar.y

%define api.parser.class {Parser}
%define api.namespace {App}
%code parser {
    private Node $ast;
    public function setAst(Node $ast): void { $this->ast = $ast; }
    public function getAst(): Node { return $this->ast; }
}

%token T_STRING
%token T_NUMBER
%token T_BOOL
%token T_NULL

%%
start:
  value  { self::setAst($1); }
;

object:
'{' members '}' { $$ = $2; }
;

members:
  %empty             { $$ = []; }
| member             { $$ = [$1]; }
| members ',' member { $$ = $1; $$[] = $3; }
;

member:
  T_STRING ':' value  { $$ = new Node('T_STRING', $1, [$3]); }
;

array:
'['  elements ']' { $$ = $2; }
;

elements:
  %empty             { $$ = []; }
| value              { $$ = [$1]; }
| elements ',' value { $$ = $1; $$[] = $3; }
;

value:
  object   { $$ = new Node('T_OBJECT', '', $1); }
| array    { $$ = new Node('T_ARRAY', '', $1); }
| T_STRING { $$ = new Node('T_STRING', $1); }
| T_NUMBER { $$ = new Node('T_NUMBER', $1); }
| T_BOOL   { $$ = new Node('T_BOOL', $1); }
| T_NULL   { $$ = new Node('T_NULL', $1); }
;

%%
Enter fullscreen mode Exit fullscreen mode
bison -S vendor/mrsuh/php-bison-skeleton/src/php-skel.m4 -o lib/parser.php grammar.y
Enter fullscreen mode Exit fullscreen mode

Command options:

  • -S vendor/mrsuh/php-bison-skeleton/src/php-skel.m4 - path to skeleton file
  • -o parser.php - output parser file
  • grammar.y - our grammar file

The final PHP file is the entry point for the parser.

bin/parse.php

<?php

require_once __DIR__ . '/../vendor/autoload.php';

use App\Parser;
use App\Lexer;
use Mrsuh\Tree\Printer;

$lexer  = new Lexer(fopen($argv[1], 'r'));
$parser = new Parser($lexer);
if (!$parser->parse()) {
    exit(1);
}

$printer = new Printer();
$printer->print($parser->getAst());
Enter fullscreen mode Exit fullscreen mode

Autoload for generated lib/parser.php file.

composer.json

{
    "autoload": {
        "psr-4": {
            "App\\": "src/"
        },
        "files": ["lib/parser.php"]
    },
    ...
}
Enter fullscreen mode Exit fullscreen mode

Finally, we can test our parser.

php bin/parse.php test.json 
.
├── T_OBJECT
    ├── T_STRING: 'fieldString'
    │   └── T_STRING: 'string'
    ├── T_STRING: 'fieldNumber'
    │   └── T_NUMBER: '99'
    ├── T_STRING: 'fieldBoolTrue'
    │   └── T_BOOL: 'true'
    ├── T_STRING: 'fieldBoolFalse'
    │   └── T_BOOL: 'false'
    ├── T_STRING: 'fieldNull'
    │   └── T_NULL: 'null'
    ├── T_STRING: 'fieldEmptyArray'
    │   └── T_ARRAY
    ├── T_STRING: 'fieldEmptyObject'
    │   └── T_OBJECT
    └── T_STRING: 'fieldArray'
        └── T_ARRAY
            ├── T_STRING: 'string'
            ├── T_NUMBER: '99'
            ├── T_BOOL: 'true'
            ├── T_BOOL: 'false'
            ├── T_NULL: 'null'
            ├── T_OBJECT
            └── T_ARRAY
Enter fullscreen mode Exit fullscreen mode

It works!

I think it will be cool if we compare the native json_decode() function and our parser.
First, I need a JSON file for benchmarks. I can get JSON info about Bulbasaur pokemon from API https://pokeapi.co.

curl 'https://pokeapi.co/api/v2/pokemon/bulbasaur' > bench.json
Enter fullscreen mode Exit fullscreen mode

The file weight is 215KB.

We need to modify our grammar.y file to avoid Node creating.

grammar-bench.y

...
value:
  object   { $$ = $1; }
| array    { $$ = $1; }
| T_STRING { $$ = $1; }
| T_NUMBER { $$ = $1; }
| T_BOOL   { $$ = $1; }
| T_NULL   { $$ = $1; }
...
Enter fullscreen mode Exit fullscreen mode
bison -S ../../src/php-skel.m4 -o lib/parser.php grammar-bench.y 
Enter fullscreen mode Exit fullscreen mode

We are ready to start the comparison.

PHP 8.2

php vendor/bin/phpbench run tests --report=my-report
+-------------+----------+----------+--------+
| subject     | mem_peak | mode     | rstdev |
+-------------+----------+----------+--------+
| benchNative | 2.539mb  | 1.570ms  | ±0.89% |
| benchBison  | 12.443mb | 84.283ms | ±1.08% |
+-------------+----------+----------+--------+
Enter fullscreen mode Exit fullscreen mode

PHP 8.1

php vendor/bin/phpbench run tests --report=my-report
+-------------+----------+----------+--------+
| subject     | mem_peak | mode     | rstdev |
+-------------+----------+----------+--------+
| benchNative | 2.593mb  | 1.595ms  | ±0.68% |
| benchBison  | 18.471mb | 87.471ms | ±0.68% |
+-------------+----------+----------+--------+
Enter fullscreen mode Exit fullscreen mode

PHP 8.0

php vendor/bin/phpbench run tests --report=my-report
+-------------+----------+----------+--------+
| subject     | mem_peak | mode     | rstdev |
+-------------+----------+----------+--------+
| benchNative | 2.700mb  | 1.586ms  | ±0.90% |
| benchBison  | 18.578mb | 87.533ms | ±0.83% |
+-------------+----------+----------+--------+
Enter fullscreen mode Exit fullscreen mode

PHP 7.4

php vendor/bin/phpbench run tests --report=my-report
+-------------+----------+-----------+--------+
| subject     | mem_peak | mode      | rstdev |
+-------------+----------+-----------+--------+
| benchNative | 2.857mb  | 1.725ms   | ±1.00% |
| benchBison  | 18.735mb | 105.099ms | ±0.91% |
+-------------+----------+-----------+--------+
Enter fullscreen mode Exit fullscreen mode

PHP Bison parser shows the best result with PHP 8.2.
It is ~56 times slower than the native json_decode() function.

I hope it was interesting for you!

You can get the parser source code here and test it by yourself.

Some useful links:

Top comments (0)