Parsing database Query with Apache Calcite

Reading Time: 3 minutes

Hey there, as a technical person sometimes we have to write the query of database and that looks good but we don’t know the query we wrote was syntactically correct or not. So in this blog, we parse the database query and test it using a test case with the help of Apache Calcite. So not wasting any time lets discuss about Apache Calcite and how to use it for parsing the database query.

Introduction

Apache Calcite is a dynamic data management framework. Even more it contains many of the pieces that comprise a typical database management system. But it omits some key functions: storage of data, algorithms to process data, and a repository for storing metadata.

What is Apache Calcite?

  • SQL parser
  • SQL validation
  • Query optimiser
  • SQL generator

Some import things you should know about Apache Calcite

  • Calcite stay out of the business of storing and processing data.
  • It is an excellent choice for mediating among the applications and one or more data storage locations and data processing engines.
  • It is also a perfect foundation for building a database: just add data.
  • Calcite does not want to own data; it does not even have a favourite data format. This example used in-memory data sets, and processed them using operators such as groupBy and join from the linq4j library.
  • It can also process data in other data formats, such as JDBC.
  • Calcite uses optimiser rules to push the JOIN and GROUP BY operations to the source database.

Advantage of using calcite:

The following features are complete.

  • Query parser, validator and optimizer.
  • Support for reading models in JSON format.
  • Many standard functions and aggregate functions.
  • JDBC queries against Linq4j and JDBC back-ends.
  • Linq4j front-end.
  • SQL features: SELECT, FROM (including JOIN syntax), WHERE, GROUP BY (including GROUPING SETS), aggregate functions (including COUNT(DISTINCT …) and FILTER), HAVING, ORDER BY (including NULLS FIRST/LAST), set operations (UNION, INTERSECT, MINUS), sub-queries (including correlated sub-queries), windowed aggregates, LIMIT (syntax as Postgres); more details in the SQL reference
  • Local and remote JDBC drivers; see Avatica.
  • Several adapters.

Stages of query execution

stages

Components of Calcite

Catalogue Defines metadata and namespaces that can be accessed in SQL queries
SQL parser – Parses valid SQL queries into an abstract syntax tree (AST).
SQL validator Validates abstract syntax trees against metadata provided by the catalog
Query optimiser – Converts AST into logical plans, optimizes logical plans, and converts logical expressions into physical plans
SQL generator – Converts physical plans to SQL

Usage of the Calcite catalog

Defines namespaces that can be accessed in Calcite queries.

catalogue

Query Optimization

  • Prune unused fields
  • Merge projections
  • Convert sub-queries to joins
  • Reorder joins
  • Push down projections
  • Push down filters

Projects using Apache Calcite

  • Hive
  • Drill
  • Flink
  • Phoenix
  • Samza
  • Storm
  • Apache everything…

Now lets coming how to parse the SQL query using Apache Calcite.

1: Add this plugin into your Dependencies or module where you have the database queries
"org.apache.calcite" % "calcite-babel" % "1.21.0"
2: create the method that take query as an input returns the SqlNode.
In this method you have to give the config like below:
val sqlParserConfig = CalciteParser
.configBuilder()
.setParserFactory(SqlBabelParserImpl.FACTORY)
.setConformance(SqlConformanceEnum.BABEL)
.build()

3: Passes the above config (I am taking the example of BABEL you can choose your own Parser Factory)
CalciteParser.create(sql, sqlParserConfig).parseQuery()
4: Now, in test cases simply assert your database query or method that returns the query
assert(parseSql(resultedSql).isInstanceOf[SqlNode])
In which parseSql is a method that actually parse the Query.

Overview

A framework for building SQL databases. It is developed over more than ten years. Written in Java and previously known as Optimiser. Also known as Farrago. Became an Apache project in 2013. Led by Julian Hyde at Hortonworks.

References

https://calcite.apache.org/docs/
https://en.wikipedia.org/wiki/Apache_Calcite
https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite
https://www.slideshare.net/JordanHalterman/introduction-to-apache-calcite

Written by 

Shivraj Singh is a senior software consultant having experience of more than 2+ years. On the personal hand, he likes eating food and playing video games like NFS. He is always ready to learn new technology and explore new trends in the IT world. Shivraj is familiar with programming languages such as Java, Scala, C, C++ and he is currently working on reactive technologies like Scala, Akka, and Kafka.