In this post we are going to make a small compiler with JavaScript. We will start by understanding what compiler is, phases of compilation then we will make a simple compiler by evaluating a simple expression.
Compiler
A compiler is a special program that turns a high-level code into a low-level machine code. In every modern browser, JavaScript code compiled by their own engine. For example Chrome uses V8, FireFox uses SpiderMonkey, Safari uses JavaScriptCore and Edge uses Chakra. Each of them has a task to do, to compiler with optimization.
The JavaScript engines are smart enough as they compiled JavaScript code with JIT(just-in-time) compiler.
Our compiler will have these three phases.
- Lexical Analysis.
- Syntax Analysis.
- Code Generation.
Lexical Analysis
In this phase of compilation, the engine first scan each line of code then it will convert them into small chunks of code called Lexeme. After that js engine will convert each Lexeme into Token. For example, var a = 10
will convert into var
a
=
10
each of them is a Token.
Syntax Analysis
In this step, compiler makes a tree called Abstract Syntax Tree by using the tokens we get from Lexical Analysis. We will cover Recursive Descent Algorithm.
Code Generation
In this step, the Abstract Syntax Tree is converted into executable code or machine code.
Prefix Language
Here is an expression we are going to compile.
add 6 with 4
If we try to simplify the whole add 6 with 4
expression,
add 6 with 4
or,
6 + 4
Creating Compiler
1. Making a Lexer Function
As discussed previously Lexer Function splits code into meaningful chunks called Token.
// code for tokenizing
const lexer = str => {
return str.split(' ').map(item => {
return item.trim()
})
}
This lexer function splits our expression into chunks.
lexer('add 6 with 4')// ["add", "6", "with", "4"]
2. Making a Parser Function or Syntax Analysis
Parser go through each token, gather the syntactic information and generate an object called AST or Abstract Syntax Tree.
So our parser will make a tree, for instance our expression is,
add 6 with 4
Generated Tokens are,
["add", "6", "with", "4"]
So Abstract Syntax Tree should be,
So, let’s write down the function,
const parser = tokens => {
let current_token_index = 0;
const parseNumber = () => ({ value: parseInt(tokens[current_token_index++]), type: 'number' });
const parseOperator = () => {
const node = { value: tokens[current_token_index++], type: 'operator', expression: [] };
while (tokens[current_token_index]) {
node.expression.push(parseExpression());
}
return node;
};
const parseExpression = () => /\d/.test(tokens[current_token_index]) ? parseNumber() : parseOperator();
return parseExpression();
};
parser(["add", "6", "with", "4"])
In the following code snippet, the parser
function took an array, depending on each item being number or string, parseExpression
called function parseNumber
or parseOperator
. parseNumber
pursed the number type value and parseOperator
parsed the string(add/sub/mul).
So now the structure will look like this,
{
value: "add",
type: "operator",
expression: [
{ value: 6, type: "number" },
{ value: "with", type: "operator", expression: [
{ value: 4, type: "number" }
]}
]
}
3. Making a Transpiler as a Code Generation
In this part, we will traverse the AST object and produce JavaScript.
// credit goes to Minko Gechev.
const transpile = ast => {
const mapOperator = { add: '+', sub: '-', mul: '*', div: '/' };
const transpileNode = ast => ast.type === 'number' ? transpileNumber(ast) : transpileOperator(ast);
const transpileNumber= ast => ast.value;
const transpileOperator = ast => `${ast.expression.map(transpileNode).join(' ' + mapOperator[ast.value] + ' ')}`;
return transpileOperator(ast);
};
The transpile
function took our ast object and mapped each expression, in that iteration inside the transpileNode
function if the type was number
then we used the transpileNumber
else transpileOperator
function. If it was a number type we just return it else from the ast object we separated the operator type in transpileOperator
function.
So, for our expression add 6 with 4
the result is 6 + 4
.
This is it. You can now change add
, sub
, mul
with your native language.