"Know how to solve every problem that has been solved." - Richard Feynman
Over the past 2 months, I’ve been working on my own very simplified version of React called Syntact. I wouldn’t call it mature yet, but it already has a couple of features working to be usable, such as:
- variable declaration
- function declaration
- components
- virtual DOM
- dynamic rendering
Besides that, I’ve also built a custom compiler as a replacement for Babel.
I made this project for a course called Advanced Programming which is a part of my bachelor Applied Computer Science. When I started this project, I had no idea what I was doing. But thanks to my coach (s/o to Lars Willemsens) and the almighty internet, I somehow managed to create something cool.
This is not really a tutorial on how to make your own React but it certainly is a good starting point for you if you’d like to do this kind of project yourself. So let’s get started.
1. The Compiler (our own kind of Babel)
Lexing
The first step is to write a ‘lexer’ or a ‘tokenizer’. ‘Lex’ stands for lexical analysis, which basically means splitting your text into tokens. It's being used in creating programming languages but also for text processing and various other things.
Token
A token is a small unit of the code. It is structured as a pair consisting of a token name and a value. Example: the keywords "let" or "const" are tokens.
Lexing with Chevrotain
Writing a lexer is the first and easiest step of the whole process. I chose to use the toolkit Chevrotain to build my lexer.
To use the Chevrotain lexer we first have to define the tokens:
/// Keywords
const Import: chevrotain.ITokenConfig = createToken({ name: "Import", pattern: /import/ });
const From: chevrotain.ITokenConfig = createToken({ name: "From", pattern: /from/ });
const Return: chevrotain.ITokenConfig = createToken({ name: "Return", pattern: /return/ });
const Const: chevrotain.ITokenConfig = createToken({ name: "Const", pattern: /const/, longer_alt: Identifier });
const Let: chevrotain.ITokenConfig = createToken({ name: "Let", pattern: /let/, longer_alt: Identifier });
...
// We then add all the tokens to an array of tokens
let allTokens = [...]
Okay so we defined our tokens and bundled them in an array. Next, we instantiate the lexer by passing the tokens to the constructor and voila. Just like that the Syntact lexer was born.
const syntactLexer: Lexer = new chevrotain.Lexer(allTokens);
Now we can use this lexer to tokenize our input.
Check out Chevrotain’s docs for more info: https://chevrotain.io/docs/tutorial/step1_lexing.html.
Parsing
The second step of the process is parsing. The parser converts a list of tokens into a Concrete Syntax Tree (CST), a fancy term for a tree data structure that represents source code.
To prevent ambiguities, the parser must take into account parenthesis and the order of operations. Parsing itself isn’t very difficult, but as more features get added, parsing can become very complex.
Parsing with Chevrotain
Again, I used Chevrotain to build a parser for Syntact. A Chevrotain parser analyses a token that conforms to some grammar.
Grammar
A grammar is a description of a set of acceptable sentences. Our parser will use this grammar to build its tree. I wrote my grammar with the ANTLR grammar syntax.
Here are some examples from my grammar file:
importStatement
: import SEMICOLON
;
binaryExpression
: atomicExpression operator atomicExpression
;
In the above example we define how an Identifier should look like, what the escape sequence is and how to recognize an import statement.
But to be honest, when using Chevrotain, it’s not really necessary to write the grammar in such a way in order to have a working parser. On the other side, it will help you to get a better view on how to build your parser.
Writing a parser
Once you've got your grammar mapped out, it's time to start building your parser. As we said before, the parser must transform the output of the lexer into a CST.
First we start by making a Parser class which we will invoke with the array of tokens that we used to define our Lexer.
class SyntactParser extends CstParser {
constructor() {
super(allTokens)
this.performSelfAnalysis()
}
// Later on, all grammer rules will come here...
}
Next we write Grammar Rules within our Parser class. Two (shortened) examples:
public importStatement = this.RULE("importStatement", () => {
this.SUBRULE(this.import)
this.CONSUME(Semicolon)
});
});
public function = this.RULE("function", () => {
this.CONSUME(Function)
this.CONSUME(Identifier)
this.CONSUME(OpenRoundBracket)
this.SUBRULE(this.parameterDeclaration)
this.CONSUME(CloseRoundBracket)
this.CONSUME(OpenCurlyBracket)
this.MANY(() => {
this.OR([
{ ALT: () => { this.SUBRULE1(this.declareVariableStatement) } },
{ ALT: () => { this.SUBRULE(this.functionStatement) } },
{ ALT: () => { this.SUBRULE(this.functionCall) } }
])
})
this.OPTION(() => this.SUBRULE(this.returnStatement))
this.CONSUME(CloseCurlyBracket)
});
We'll write grammar rules according to the grammar which we've mapped out earlier using the ANTLR grammar syntax.
Once that's done - believe me, it takes a while - we can start parsing the tokens. The output will be a CST that Chevrotain builds for us.
AST
Once we have our CST, we are going to convert it to an Abstract Syntax Tree (AST). An AST is like a CST but it contains information specific to our program which means it doesn't contain unnecessary information like Semicolons or Braces. In order to obtain an AST, we have to ‘visit’ the CST using a CST Visitor or how I like to call it, an Interpreter.
Interpreter
The interpreter will traverse our CST and create nodes for our AST. Thanks to Chevrotain, this is a relatively doable step.
Here is a tiny look at the Syntact interpreter:
class SyntactInterpreter extends SyntactBaseCstVisitor {
constructor() {
super();
this.validateVisitor();
}
...
declareComponent(ctx: any) {
const componentName = ctx.Identifier[0].image;
const parameters = this.visit(ctx.parameterDeclaration);
const returnStatement = this.visit(ctx.returnStatement);
const variableStatements = [];
if (ctx.declareVariableStatement) {
ctx.declareVariableStatement.forEach((e: any) => {
variableStatements.push(this.visit(e))
})
}
return {
type: types.COMPONENT_DECLARATION,
id: {
type: types.IDENTIFIER,
name: componentName
},
parameters,
body: { variableStatements },
returnStatement
};
}
...
}
Generator
Get the point of an AST? Cool! Now we can go on and start with the generator. The generator will actually make JS code based on the AST.
I find this one of the hardest parts of the whole parsing process. You’ll have to iterate over all the nodes in the AST and make working JS code from it.
Here is how that might look like:
class SyntactGenerator implements Generator {
...
private convertFunBody(body: any) {
let returnCode: any[] = [];
if (body.variableStatements) {
body.variableStatements.forEach((vS: any) => {
let datatype = vS.dataType;
let varName = vS.variableName;
let value = vS.value;
returnCode.push(`${datatype.toLowerCase()} ${varName} = ${value};\n`)
});
}
if (body.functionCalls) {
body.functionCalls.forEach((fC: any) => {
let params: string[] = [];
if (fC.params) {
fC.params.forEach((p: string) => { params.push(p) })
}
returnCode.push(`${fC.function}(${params.join(",")});`)
});
}
return returnCode.join("");
}
...
}
Err, come again, please.
Exhausted and a bit confused after reading all this? I get you. Here’s a recap:
- Lexer => responsible for transforming raw text into a stream of tokens.
- Parser => transforms the stream of tokens into Concrete Syntax Tree (CST).
- CST Visitor/Interpreter => recursively visits each node in CST which results in an Abstract Syntax Tree (AST).
- Generator = > actually makes JS code based on the provided AST.
Once we got the above things working, we can start making something I called a “SyntactEngine”.
SyntactEngine
Next, I made a SyntactEngine class. It will make it easier for us to orchestrate the different phases of transpiling our JSX to JS. It holds an entrypoint method called “transpileJsxToJs” which we can later use in our Webpack loader.
class SyntactEngine implements Engine {
private lexer: Lexer;
private parser: SyntactParser;
private interpreter: SyntactInterpreter;
private generator: Generator;
constructor() {
...
}
transpileJsxToJs(input: string): string {
...
}
tokenizeInput(input: string): ILexingResult {
...
}
parseInput(lexingResult: ILexingResult): ParseResultType {
...
}
toAst(parsedInput: ParseResultType) {
...
}
generateJsFromAst(ast: string): string {
...
}
}
2. Syntact API
We have a working compiler that can generate JS code from JSX. Now we need to build a Syntact API that can actually do the things that a framework like React can do. Create a virtual DOM, hold states and so on.
I just sticked to a simple virtual DOM for now. For this I made a small recursive algorithm that creates a DOM based on the initial given element (a div for example) and all its members.
Here is a shortened version of the method:
createDom(type: string, props: any, members: any, value: string | null) {
const element: any = document.createElement(type, null);
props.forEach((prop: any) => {
if (prop.type.substring(0, 2) === 'on') {
/* Check if prop type is a function handler
* Note: eval might be a security risk here. */
element[prop.type.toLowerCase()] = () => {
eval(prop.value)
}
} else if (prop.type == 'class') {
element.classList.add(prop.value)
}
});
return element;
}
3. Webclient + Webpack
Once we’ve got the compiler and the Syntact API, we can start integrating both into our client app using a webpack loader.
The webpack loader will preprocess the Syntact JSX by using the compiler and convert it to JS code. Then, the JS code will use the Syntact API to actually use Syntact’s features.
The End
If you made it this far thanks for reading! I hope this article helps you understand how React and Babel work under the hood.
Top comments (1)
Could you share your work for community?