DEV Community

Cover image for AST parser with PHP and Bison
Anton Sukhachev
Anton Sukhachev

Posted on • Edited on

AST parser with PHP and Bison

Read this post if you don't know what Bison is.

I already have the Bison AST parser, but this time I will do it without PHP FFI.

First, we need to install the PHP skeleton package to build the parser and tree printer package to print AST.

composer require --dev mrsuh/php-bison-skeleton
composer require mrsuh/tree-printer
Enter fullscreen mode Exit fullscreen mode

It will more readable if we separate code from printer.php to individual files.

.
├── /ast-parser
    ├── /bin
    │   └── parse.php # entry point for parser
    ├── /lib
    │   └── parser.php # generated file
    ├── /src
    │   ├── Lexer.php
    │   └── Node.php # AST node
    └── grammar.y       
Enter fullscreen mode Exit fullscreen mode

To print AST with the tree printer package Node class must implement Mrsuh\Tree\NodeInterface.

src/Node.php

<?php

namespace App;

use Mrsuh\Tree\NodeInterface;

class Node implements NodeInterface
{
    private string $name;
    private string $value;
    /** @var Node[] */
    private array $children;

    public function __construct(string $name, string $value, array $children = [])
    {
        $this->name     = $name;
        $this->value    = $value;
        $this->children = $children;
    }

    public function getChildren(): array
    {
        return $this->children;
    }

    public function __toString(): string
    {
        $line = $this->name;
        if (!empty($this->value)) {
            $line .= sprintf(" '%s'", $this->value);
        }

        return $line;
    }
}
Enter fullscreen mode Exit fullscreen mode

Lexer is not modified from previous post, but this time we will put it in a separate file src/Lexer.php.

src/Lexer.php

<?php

namespace App;

class Lexer implements LexerInterface {

    private array $words;
    private int   $index = 0;
    private int   $value = 0;

    public function __construct($resource)
    {
        $this->words = explode(' ', trim(fgets($resource)));
    }

    public function yyerror(string $message): void
    {
        printf("%s\n", $message);
    }

    public function getLVal()
    {
        return $this->value;
    }

    public function yylex(): int
    {
        if ($this->index >= count($this->words)) {
            return LexerInterface::YYEOF;
        }

        $word = $this->words[$this->index++];
        if (is_numeric($word)) {
            $this->value = (int)$word;

            return LexerInterface::T_NUMBER;
        }

        return ord($word);
    }
}
Enter fullscreen mode Exit fullscreen mode

For example, Lexer will translate the expression 10 + 20 - 30 into this:

word token value
10 LexerInterface::T_NUMBER (258) 10
+ ASCII (43)
20 LexerInterface::T_NUMBER (258) 20
- ASCII (45)
30 LexerInterface::T_NUMBER (258) 30
LexerInterface::YYEOF (0)

It's time to create the grammar.y file and build lib/parser.php
You can define %code blocks, so Bison will render code as is in printer.php

grammar.y

%code imports { // code imports };
%code parser  { // code parser }; 
%code init    { // code init };           
Enter fullscreen mode Exit fullscreen mode

printer.php

<?php

// code imports

class Parser {
    // code parser
    public function __construct() {
        // code init
    }
}
Enter fullscreen mode Exit fullscreen mode

We will use block %code parser to define variables and methods to store AST into the Parser class.
Bison has reserved the symbol $ in grammar actions.
It's very sad for PHP developers, but we can call the function setAst() with self::setAst() instead of $this->setAst().

grammar.y

%define api.parser.class {Parser}
%define api.namespace {App}
%code parser {
    private Node $ast;
    public function setAst(Node $ast): void { $this->ast = $ast; }
    public function getAst(): Node { return $this->ast; }
}

%token T_NUMBER

%left '-' '+'

%%
start:
  expression                 {  self::setAst($1); }
;

expression:
  T_NUMBER                   { $$ = new Node('NUMBER', $1); }
| expression '+' expression  { $$ = new Node('OPERATION_PLUS', '', [$1, $3]);  }
| expression '-' expression  { $$ = new Node('OPERATION_MINUS', '', [$1, $3]);  }
;
Enter fullscreen mode Exit fullscreen mode
bison -S vendor/mrsuh/php-bison-skeleton/src/php-skel.m4 -o lib/parser.php grammar.y
Enter fullscreen mode Exit fullscreen mode

Command options:

  • -S vendor/mrsuh/php-bison-skeleton/src/php-skel.m4 - path to skeleton file
  • -o parser.php - output parser file
  • grammar.y - our grammar file

And final PHP file is the entry point bin/parse.php.

bin/parse.php

<?php

require_once __DIR__ . '/../vendor/autoload.php';

use App\Lexer;
use App\Parser;
use Mrsuh\Tree\Printer;

$lexer  = new Lexer(STDIN);
$parser = new Parser($lexer);
if (!$parser->parse()) {
    exit(1);
}

$printer = new Printer(STDOUT);
$printer->print($parser->getAst());
Enter fullscreen mode Exit fullscreen mode

We need to add a special autoload section to composer.json for generated lib/parser.php file.

composer.json

{
    "autoload": {
        "psr-4": {
            "App\\": "src/"
        },
        "files": ["lib/parser.php"]
    },
    ...
}
Enter fullscreen mode Exit fullscreen mode

Ok. Our parser is ready and we can test it:

php bin/parse.php <<< "1 + 2 - 3"
.
├── OPERATION_MINUS
    ├── OPERATION_PLUS
    │   ├── NUMBER '1'
    │   └── NUMBER '2'
    └── NUMBER '3'
Enter fullscreen mode Exit fullscreen mode

Try to parse big expression:

php bin/parse.php <<< "1 + 2 - 3 + 4 - 5 + 6 - 7 + 8 - 9 + 10"
.
├── OPERATION_PLUS
    ├── OPERATION_MINUS
    │   ├── OPERATION_PLUS
    │   │   ├── OPERATION_MINUS
    │   │   │   ├── OPERATION_PLUS
    │   │   │   │   ├── OPERATION_MINUS
    │   │   │   │   │   ├── OPERATION_PLUS
    │   │   │   │   │   │   ├── OPERATION_MINUS
    │   │   │   │   │   │   │   ├── OPERATION_PLUS
    │   │   │   │   │   │   │   │   ├── NUMBER '1'
    │   │   │   │   │   │   │   │   └── NUMBER '2'
    │   │   │   │   │   │   │   └── NUMBER '3'
    │   │   │   │   │   │   └── NUMBER '4'
    │   │   │   │   │   └── NUMBER '5'
    │   │   │   │   └── NUMBER '6'
    │   │   │   └── NUMBER '7'
    │   │   └── NUMBER '8'
    │   └── NUMBER '9'
    └── NUMBER '10'
Enter fullscreen mode Exit fullscreen mode

Great!

You can get the parser source code here and test it by yourself.

Some useful links:

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)

nextjs tutorial video

Youtube Tutorial Series 📺

So you built a Next.js app, but you need a clear view of the entire operation flow to be able to identify performance bottlenecks before you launch. But how do you get started? Get the essentials on tracing for Next.js from @nikolovlazar in this video series 👀

Watch the Youtube series

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay