Create a simple Compiler with JavaScript

In this post we are going to make a small compiler with JavaScript. We will start by understanding what compiler is, phases of compilation then we will make a simple compiler.

Compiler

A compiler is a special program that turns a high-level code into a low-level machine code. In every modern browser, JavaScript code compiled by their own engine. For example Chrome uses V8, FireFox uses SpiderMonkey, Safari uses JavaScriptCore and Edge uses Chakra. Each of them has a task to do, to compiler with optimization.

The JavaScript engines are smart enough as they compiled JavaScript code with JIT(just-in-time) compiler.

Our compiler will have these three phases.

  1. Lexical Analysis.
  2. Syntax Analysis.
  3. Code Generation.

Lexical Analysis

In this phase of compilation, the engine first scan each line of code then it will convert them into small chunks of code called Lexeme. After that js engine will convert each Lexeme into Token. For example, var a = 10 will convert into var a = 10 each of them is a Token.

Syntax Analysis

In this step, compiler makes a tree called Abstract Syntax Tree by using the tokens we get from Lexical Analysis. We will cover Recursive Descent Algorithm.

Code Generation

In this step, the Abstract Syntax Tree is converted into executable code or machine code.

Prefix Language

Here is an expression we are going to compile.

add 6 with 4

If we try to simplify the whole add 6 with 4 expression,

add 6 with 4
or,
6 + 4

Creating Compiler

1. Making a Lexer Function

As discussed previously Lexer Function splits code into meaningful chunks called Token.

// code for tokenizing

const lexer = str => {
    return str.split(' ').map(item => {
      return item.trim()
    })
}

This lexer function splits our expression into chunks.

lexer('add 6 with 4')// ["add", "6", "with", "4"]

2. Making a Parser Function or Syntax Analysis

Parser go through each token, gather the syntactic information and generate an object called AST or Abstract Syntax Tree.

So our parser will make a tree, for instance our expression is,

add 6 with 4

Generated Tokens are,

["add", "6", "with", "4"]

So Abstract Syntax Tree should be,

Abstract Syntax Tree

So, let’s write down the function,

const parser = tokens => {
  
  let current_token_index = 0;

  const parseNumber = () => ({ value: parseInt(tokens[current_token_index++]), type: 'number' });

  const parseOperator = () => {
    const node = { value: tokens[current_token_index++], type: 'operator', expression: [] };
    while (tokens[current_token_index]) {
      node.expression.push(parseExpression());
    }
    return node;
  };

  const parseExpression = () => /\d/.test(tokens[current_token_index]) ? parseNumber() : parseOperator();

  return parseExpression();
};

parser(["add", "6", "with", "4"])

In the following code snippet, the parser function took an array, depending on each item being number or string, parseExpression called function parseNumber or parseOperator. parseNumber pursed the number type value and parseOperator parsed the string(add/sub/mul).

So now the structure will look like this,

{
    value: "add",
    type: "operator",
    expression: [
        { value: 6, type: "number" },
        { value: "with", type: "operator", expression: [
            { value: 4, type: "number" }
        ]}
    ]
}

3. Making a Transpiler as a Code Generation

In this part, we will traverse the AST object and produce JavaScript.

// credit goes to Minko Gechev.

const transpile = ast => {
  const mapOperator = { add: '+', sub: '-', mul: '*', div: '/' };
  const transpileNode = ast => ast.type === 'number' ? transpileNumber(ast) : transpileOperator(ast);
  const transpileNumber= ast => ast.value;
  const transpileOperator = ast => `${ast.expression.map(transpileNode).join(' ' + mapOperator[ast.value] + ' ')}`;
  return transpileOperator(ast);
};

The transpile function took our ast object and mapped each expression, in that iteration inside the transpileNode function if the type was number then we used the transpileNumber else transpileOperator function. If it was a number type we just return it else from the ast object we separated the operator type in transpileOperator function.

So, for our expression add 6 with 4 the result is 6 + 4.

This is it. You can now change add, sub, mul with your native language.

Resources

Leave a Reply

Your email address will not be published. Required fields are marked *