Earlier we discussed in our blog how to configure the ANTLR plugin for the intellij for getting started with our language.
In this post we will discuss the basics of the ANTLR and exactly how can we get started with our main goal. What is the lexer, parser and what are their roles and many other things. So lets get started,
- OpenJDK Compiler Grammar project experimental version of the javac compiler based upon a grammar written in ANTLR
- Apex, Salesforce.com‘s programming language
- The expression evaluator in Numbers, Apple’s spreadsheet
- Twitter‘s search query language
- Weblogic server
- IntelliJ IDEA and Clion.
- Apache Cassandra
ANTLR can generate lexers, parsers, tree parsers, and combined lexer-parsers. Parsers can automatically generate abstract syntax trees which can be further processed with tree parsers. ANTLR provides a single consistent notation for specifying lexers, parsers, and tree parsers. This is in contrast with other parser/lexer generators and adds greatly to the tool’s ease of use.
ANTLR is code generator. It takes grammer file(.g4 extension ) as input and generates two classes: lexer and parser, and visitor (if required).
Lexer runs first and splits input into pieces called tokens. The stream of tokens is passed to parser which do all necessary work. It is the parser who builds abstract syntax tree, interprets the code or translate it into some other form.
The code can be generated in Java, Python and many other languages as we have seen in the tutorial before
Most importantly, grammar file describes how to split input into tokens and how to build tree from tokens. In other words, grammar file contains lexer rules and parser rules.
Each lexer rule describes one token:
We will create simplest possible language parser – hello word parser. It builds a small abstract syntax tree from a single expression: ‘Hello word!’.
We will use it to show how to create a grammar file and generate ANTLR classes from it. Then, we will show how to use generated files and create an unit test.
Each grammar file must have at least one lexer rule. Each lexer rule must begin with upper case letter. We have two rules, first defines a salutation token, second defines an endsymbol token. Salutation must be ‘Hello word’ and endsymbol must be ‘!’.
Similarly, each grammar file must have at least one parser rule. Each parser rule must begin with lower case letter. We have only one parser rule: any expression in our language must be composed of a salutation followed by an endsymbol.
Note: the order of grammar file elements is fixed. If you change it, antlr plugin will fail.
A simple example of grammer would be like this.
Generate Lexer and Parser
You can easily generate the Lexer and Parser using the Tool that you have added using steps from the previous blog.
So there you go , now generate the Lexer, Listener, Visitor and Parser. So that you can use the language.
Using the Grammar in Your Code
If you want some more information about generating the ANTLR and using it. You can refer to the links here.
In the upcoming blogs we will discuss Unit Testing of ANTLR and how to get it Right.!
Till then Happy Hakking .! 🙂 😉
- Antlr Summary by Terence Parr (The_Antlr_Guy)
- Antlr Tutorial Expression Language
- Coursera’s course on Compilers by Alex Aiken
- And our lovely Wikipedia