Apache calcite : How to produced physical(Optimised) tree.

Reading Time: 3 minutes

In this blog we are going to see how we can create the physical(Optimised) tree for a given query with the help of Apache calcite.

What is Apache Calcite

Apache Calcite is a dynamic data management framework with SQL parser, optimiser, executor, and JDBC driver.

How Apache Calcite work

Suppose We have one table and now we need to perform a SQL Query.

val QUERY: String = """SELECT x from t
    .replaceAll("\\n", " ")

but the problem is there are no details in the query telling the database engine how to get the data so here Calcite comes to the picture.

So calcite provides us some steps with the help of these steps you can create the Optimised tree.

So let’s discuss all the steps…

Step -1 : Parsing

The whole process starts with parsing. A query, in order to be understood by the database engine, must first be parsed using an SQL parser.which takes a string of characters and tries to deduce its syntactic structure in the form of a parse tree.

For instance, a rule for parsing SQL SELECT statements might look like this:

<SELECT> expressionList
           	[<FROM> table]
           	[<WHERE> condition]
           	[<GROUP> <BY> groupingList]
           	[<HAVING> condition]

Step -2 : Validation

For validations we create our own schema with the help of AbstractSchema class .We extend Apache Calcite’s AbstractSchema class to define our own schema. And validate the query.

Let’s see how schema look like:

public class SimpleSchema extends AbstractSchema {

    private final String schemaName;
    private final Map<String, Table> tableMap;

    private SimpleSchema(String schemaName, Map<String, Table> tableMap) {
        this.schemaName = schemaName;
        this.tableMap = tableMap;

    public Map<String, Table> getTableMap() {
        return tableMap;

Once the validation is passed then we Got SqlNode (SqlNode is a SQL parse tree. It may be an operator, literal, identifier, and so forth.)

Step – 3 : Translation to relational algebra

For representing the query as a tree structure, We need    Relational algebra(Relational operators).

Relational algebra deals with abstract transformations over sets of data,such as.

  • Selection : Filtering based on a predicate.
  • Projection : Choosing and modifying some columns of a row.
  • Union : Combining several row sets into one.
  • Aggregation : Computing a scalar function over a set of rows.

AST(abstract syntax tree) is not convenient for query optimisation because the semantics of its nodes are too complicated.

It is much more convenient to perform query optimisation on a tree of relational operators, defined by the RelNode sub-classes, such as Scan, Project, Filter, Join, etc. We use SqlToRelConverter, another monstrous class of Apache Calcite, to convert the original AST into a relational tree.

Step -4 : Planning and optimisation

Optimisation is a process of conversion of a relation tree to another relational tree. You may do rule-based optimisation with heuristic or cost-based planners, HepPlanner and VolcanoPlanner respectively.

You may also do any manual rewrite of the tree without rule. Apache Calcite comes with several powerful rewriting tools, such as RelDecorrelator and RelFieldTrimme.

We will use VolcanoPlanner to perform cost-based optimisation.

Now we use our optimiser to parse, validate, and convert the query.

SqlNode sqlTree = optimizer.parse(sql);
SqlNode validatedSqlTree = optimizer.validate(sqlTree);
RelNode relTree = optimizer.convert(validatedSqlTree);

Step – 5 : Execution

The produced logical tree looks like this.

LogicalAggregate(group=[{}], revenue=[SUM($0)]): rowcount = 1.0, cumulative cost = 63751.137500047684
  LogicalProject($f0=[*($1, $2)]): rowcount = 1875.0, cumulative cost = 63750.0
    LogicalFilter(condition=[AND(>=($3, ?0), <($3, ?1), >=($2, -(?2, 0.01)), <=($2, +(?3, 0.01)), <($0, ?4))]): rowcount = 1875.0, cumulative cost = 61875.0
      LogicalTableScan(table=[[tpch, lineitem]]): rowcount = 60000.0, cumulative cost = 60000.0


Apache Calcite is a flexible framework for query optimisation. In this blog post, we demonstrated how to optimise SQL queries with Apache Calcite parser, validator, converter, and rule-based optimiser.

That’s it, folks. I hope you liked the blogs. Thanks!



To read more tech blogs, visit Knoldus Blogs.

Written by 

Gaurav srivastav is a Software Consultant working in java domain.