
In this blog we are going to see how we can create the physical(Optimised) tree for a given query with the help of Apache calcite.
What is Apache Calcite
Apache Calcite is a dynamic data management framework with SQL parser, optimiser, executor, and JDBC driver.
How Apache Calcite work
Suppose We have one table and now we need to perform a SQL Query.
val QUERY: String = """SELECT x from t
|""".stripMargin
.replaceAll("\\n", " ")
but the problem is there are no details in the query telling the database engine how to get the data so here Calcite comes to the picture.
So calcite provides us some steps with the help of these steps you can create the Optimised tree.
So let’s discuss all the steps…
Step -1 : Parsing
The whole process starts with parsing. A query, in order to be understood by the database engine, must first be parsed using an SQL parser.which takes a string of characters and tries to deduce its syntactic structure in the form of a parse tree.
For instance, a rule for parsing SQL SELECT statements might look like this:
<SELECT> expressionList
[<FROM> table]
[<WHERE> condition]
[<GROUP> <BY> groupingList]
[<HAVING> condition]
Step -2 : Validation
For validations we create our own schema with the help of AbstractSchema class .We extend Apache Calcite’s AbstractSchema class to define our own schema. And validate the query.
Let’s see how schema look like:
public class SimpleSchema extends AbstractSchema {
private final String schemaName;
private final Map<String, Table> tableMap;
private SimpleSchema(String schemaName, Map<String, Table> tableMap) {
this.schemaName = schemaName;
this.tableMap = tableMap;
}
@Override
public Map<String, Table> getTableMap() {
return tableMap;
}
}
Once the validation is passed then we Got SqlNode (SqlNode is a SQL parse tree. It may be an operator, literal, identifier, and so forth.)
Step – 3 : Translation to relational algebra
For representing the query as a tree structure, We need Relational algebra(Relational operators).
Relational algebra deals with abstract transformations over sets of data,such as.
- Selection : Filtering based on a predicate.
- Projection : Choosing and modifying some columns of a row.
- Union : Combining several row sets into one.
- Aggregation : Computing a scalar function over a set of rows.
AST(abstract syntax tree) is not convenient for query optimisation because the semantics of its nodes are too complicated.
It is much more convenient to perform query optimisation on a tree of relational operators, defined by the RelNode sub-classes, such as Scan, Project, Filter, Join, etc. We use SqlToRelConverter, another monstrous class of Apache Calcite, to convert the original AST into a relational tree.
Step -4 : Planning and optimisation
Optimisation is a process of conversion of a relation tree to another relational tree. You may do rule-based optimisation with heuristic or cost-based planners, HepPlanner and VolcanoPlanner respectively.
You may also do any manual rewrite of the tree without rule. Apache Calcite comes with several powerful rewriting tools, such as RelDecorrelator and RelFieldTrimme.
We will use VolcanoPlanner to perform cost-based optimisation.
Now we use our optimiser to parse, validate, and convert the query.
SqlNode sqlTree = optimizer.parse(sql);
SqlNode validatedSqlTree = optimizer.validate(sqlTree);
RelNode relTree = optimizer.convert(validatedSqlTree);
Step – 5 : Execution
The produced logical tree looks like this.
LogicalAggregate(group=[{}], revenue=[SUM($0)]): rowcount = 1.0, cumulative cost = 63751.137500047684
LogicalProject($f0=[*($1, $2)]): rowcount = 1875.0, cumulative cost = 63750.0
LogicalFilter(condition=[AND(>=($3, ?0), <($3, ?1), >=($2, -(?2, 0.01)), <=($2, +(?3, 0.01)), <($0, ?4))]): rowcount = 1875.0, cumulative cost = 61875.0
LogicalTableScan(table=[[tpch, lineitem]]): rowcount = 60000.0, cumulative cost = 60000.0
Summary
Apache Calcite is a flexible framework for query optimisation. In this blog post, we demonstrated how to optimise SQL queries with Apache Calcite parser, validator, converter, and rule-based optimiser.
That’s it, folks. I hope you liked the blogs. Thanks!
References
To read more tech blogs, visit Knoldus Blogs.
