Creating a DSL (Domain Specific Language) using ANTLR ( Part-II) : Writing the Grammar file.

Reading Time: 3 minutes

Earlier we discussed in our blog how to configure the ANTLR plugin for the intellij for getting started with our language.

In this post we will discuss the basics of the ANTLR  and exactly how can we get started with our main goal. What is the lexer, parser and what are their roles and many other things. So lets get started,

Antlr stands for ANother Tool for Language Recognition. The tool is able to generate compiler or interpreter for any computer language. If you need to parse languages like Java , scala, php then this is the thing that you are looking for.
Here is the list of some projects that uses ANTLR.

ANTLR can generate lexers, parsers, tree parsers, and combined lexer-parsers. Parsers can automatically generate abstract syntax trees which can be further processed with tree parsers. ANTLR provides a single consistent notation for specifying lexers, parsers, and tree parsers. This is in contrast with other parser/lexer generators and adds greatly to the tool’s ease of use.

This post begins with a small demonstration of ANTLR usefulness. Then, we explain what ANTLR is and how does it work. Finally, we show how to compile a simple ‘Hello word!’ language into an abstract syntax tree. The post explains also how to add error handling and how to test the language.

Overview

ANTLR is code generator. It takes grammer file(.g4 extension ) as input and generates two classes: lexer and parser, and visitor (if required).

Lexer runs first and splits input into pieces called tokens. The stream of tokens is passed to parser which do all necessary work. It is the parser who builds abstract syntax tree, interprets the code or translate it into some other form.

The code can be generated in Java, Python and many other languages as we have seen in the tutorial before

Most importantly, grammar file describes how to split input into tokens and how to build tree from tokens. In other words, grammar file contains lexer rules and parser rules.

Each lexer rule describes one token:

TokenName: regular expression;  
Parser rules are more complicated. The most basic version is similar as in lexer rule:
ParserRuleName: regular expression;  
They may contain modifiers that specify special transformations on input, root and childs in result abstract syntax tree or actions to be performed whenever rule is used. Almost all work is usually done inside parser rules.

Hello Word

We will create simplest possible language parser – hello word parser. It builds a small abstract syntax tree from a single expression: ‘Hello word!’.

We will use it to show how to create a grammar file and generate ANTLR classes from it. Then, we will show how to use generated files and create an unit test.

grammar HelloWorld101;

Each grammar file must have at least one lexer rule. Each lexer rule must begin with upper case letter. We have two rules, first defines a salutation token, second defines an endsymbol token. Salutation must be ‘Hello word’ and endsymbol must be ‘!’.

SALUTATION:'Hello world';  
ENDSYMBOL:'!';

Similarly, each grammar file must have at least one parser rule. Each parser rule must begin with lower case letter. We have only one parser rule: any expression in our language must be composed of a salutation followed by an endsymbol.

expression : SALUTATION ENDSYMBOL;

Note: the order of grammar file elements is fixed. If you change it, antlr plugin will fail.

Sample Grammer

A simple example of grammer would be like this.

Screenshot from 2016-04-22 23-54-29

Generate Lexer and Parser

You can easily generate the Lexer and Parser using the Tool that you have added using steps from the previous blog.

So there you go , now generate the Lexer, Listener, Visitor and Parser. So that you can use the language.

Using the Grammar in Your Code

Screenshot from 2016-04-23 00-00-02

If you want some more information about generating the ANTLR and using it. You can refer to the links here.

  1. Antlr-tutorial-hello-word
  2. Antlr Tutorial Expression Language
  3. Everything about ANTLR

In the upcoming blogs we will discuss Unit Testing of ANTLR and how to get it Right.!

Till then Happy Hakking .!  🙂 😉

References:

  1. Antlr Summary by Terence Parr (The_Antlr_Guy)
  2. Antlr-tutorial-hello-word
  3. Antlr Tutorial Expression Language
  4. Coursera’s course on Compilers by Alex Aiken
  5.  And our lovely Wikipedia

3 thoughts on “Creating a DSL (Domain Specific Language) using ANTLR ( Part-II) : Writing the Grammar file.4 min read

Comments are closed.

Discover more from Knoldus Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading