The Tale of ‘Tail Recursion’

Recursion in computer science is a method where the solution to a problem depends on solutions to smaller instances of the same problem (as opposed to iteration).

Recursions are really cool and they are highly expressive. For example consider factorial function:


But if i increase the size of input, code will blow up.


Recursions don’t scale very well for large input sizes. And in practical way, when i use the recursion when problem size is complex and big. So, its kind of defeat when you need it most you can’t use it.

Continue reading

Posted in Scala | Tagged , , | Leave a comment

Solr with Java: A basic hands-on with SolrJ

What is Apache Solr:

Apache Solr is a search sever that includes the full-text search engine called Apache Lucene. It takes the piece of information (called documents) that are indexed according to the cores. When a query is performed, solr goes through the index and return the matching documents.

Now let’s start the hands-on.

Step 1: Install Solr from the following link.

Step 2: Start the solr on your local.

Go to the directory and on the terminal type:

bin/solr start

Now before we go further,

click here just go through the basic query parameters used in Solr.

Well it’s time to code.

Step 3: Add maven dependency in pom.xml






Step 4: Create an object of SolrClient and initialize it with the URI & port being used (byDefault: 8983).

SolrClient client = new HttpSolrClient("http://my-solr-server:8983/solr");

Step 5: Create an object of SolrQuery

Continue reading

Posted in Java | Tagged , , , , , | 1 Comment

Getting Introduced with Presto

Hi Folks!

In today’s blog I will be introducing you to a new open source distributed Sql Query Engine – Presto. It is designed for running SQL queries over Big Data( petabytes of Data). It was designed by the people at Facebook.


Quoting it’s formal definition “Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.”

The motive behind the inception of Presto was to enable interactive analytics and approaches the speed of commercial data warehouses with the power to scala size of organisations matching Facebook.

Continue reading

Posted in big data, Scala | Tagged , , , , | Leave a comment

Connecting To Presto via JDBC

Hi Guys,

In this blog we’ll be discussing about how to make a connection to presto server using JDBC, but before we get started let’s discuss what Presto is.

What is Presto ?

So, Presto is an open source distributed SQL query engine for running interactive analytic queries against different data sources. The sizes may ranges from gigabytes to petabytes. It runs on a cluster of machines and its installation includes a coordinator and multiple workers. It allows querying data where it lives, including Hive, Cassandra, relational databases or even proprietary data stores. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization.

For deploying presto on your machine you can go through the following link : Presto Installation

Setting up a JDBC connection

Prerequisite : The presto cluster must be running before establishing the connection.

Below is your JDBC driver url for presto driver.

JDBC_DRIVER = "com.facebook.presto.jdbc.PrestoDriver";

And your Database url should look like :

DB_URL = "jdbc:presto://localhost:8080/catalog/schema";

The following JDBC URL formats are supported:


Continue reading

Posted in big data, database, Java, Scala, sql | Tagged | Leave a comment

Introduction To HADOOP !

Here I am to going to  write a blog on Hadoop!

“Bigdata is not about data! The value in Bigdata [is in] the analytics. ”

-Harvard Prof. Gary King

So the Hadoop came into Introduction!

Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.


Hadoop was created by computer scientists Doug Cutting and Mike Cafarella in 2006 to support distribution for the Nutch search engine. It was inspired by Google’s MapReduce.


The problem with RDBMS is , it can not processed semi-structured and unstructured data (text, videos, audios, Facebook posts, clickstream data, etc.). It can only work with structured data(banking transaction, location information, etc.). Both are also different in term of processing data.

Continue reading

Posted in Apache Flink, apache spark, big data, database, HDFS, knoldus, Scala, software, Spark, Test, testing | 2 Comments

Naming is Too Hard : Finding the Right Level of Abstraction

How we can make our code more readable !! The answer of this question is “With The Help Of Good and appropriate naming”.  In programming Experience has confirmed Naming things is too hard.So hard in fact that it’s common for programmers with years and years of experience to regularly name things poorly.

There is a number of nuances to giving something a right and appropriate name, but the one I am going to talk about here is finding the right level of abstraction. Here is some simple instances:

  • val employeeList = new List[Employee]();
  • val sqlServerConnection = getDatabaseConnection();

Continue reading

Posted in Scala | Tagged , , | Leave a comment

Scalafmt – Styling The Beast

“I have never seen elegance go out of style” – SONYA TECLAI

Scala (The Beast) has imported efficiency and elegance to the programming world by combining coziness of object oriented programming with the magic of functional programming. This blog illuminates the work of Ólafur Páll Geirsson who has done a great job of styling the beast. In the bulk below I will discuss my experience of using Scalafmt, its installation process and some of its cool code styling features. Stuff we would be using in this blog

  • Editor -> Intellij (v 2017)
  • Build Tool -> sbt (v 0.13.15)
  • Language -> Scala (v 2.12.1)

One of the most important aspect of good code is its readability which comes with good and standard formatting. Ever wondered how an entire project having around 1000 Scala files of poorly formatted code could be formatted without having a headache? Well going shft + ctrl + alt + l on your IntelliJ could be quite a pain for your fingers and brain.

The goal of Scalafmt is to produce good enough formatted code for Scala so that one can focus on programming instead of manipulating syntax trivia. Who wouldn’t appreciate the following transformation :




Scalafmt can be used in several environments such as command line, text editors and build tools. In this blog we will look at Salafmt integration with IntelliJ and sbt(My preference).

Continue reading

Posted in Scala | 3 Comments

Wait! Don’t write your microservice … yet

Day one, we were super excited to start a new project for a huge financial institution. They seemed to know the domain and as Knoldus we understood the technology. The stakeholders were excited about the Reactive Paradigm and the client architects were all stoked up the way there would be microservices which would allow them all the benefits that other companies had been harping about. We had proposed Lagom and it was taken well.

We had an overview of the domain and we had a set of domain experts assigned to us who would be product owners on different tracks that we would work on. Great, does seem like a perfect start. But when things are too good to be true … probably they are.

The lead architect (LA) asks me, so how much logs would each microservice generate? I am like … what? And he explains “Since we are going to build it with microservices so I need to understand the logging needs so that I can size the disks”. I am like … “I don’t know … uh…mmm”.  LA ” But, we explained the domain to you so I thought this would be easy! Ok, how many microservices would we have?” I respond …” I don’t know! … yet”. And now the LA is losing it on me. I could go on how the conversation evolved or rather degraded but I would rather talk about how we finally managed to convince the stakeholders that this thought process is incorrect.

Whenever we start a new project, we start building a MONOLITH. Yes, you heard me. That is how we start. When you are starting a product, you know too little about how the intercommunication between teams, subdomains, and bounded contexts would work out. Once we have the monolith and we identify the submodules and the play between them, it is relatively trivial to break it down into microservices. The only things to remember is to design a monolith carefully. Paying attention to modularity especially at the API boundaries and data storage. If this is done right, it is simple to shift to microservices. Once we have the well-done monolith, it is easy to carve out and logically package subparts of the monolith as a microservice.

Screenshot from 2017-04-19 23-58-40

You could, of course, argue about the benefits of getting to microservices directly which are like once we have a monolith, it cannot be broken down into microservices easily. We cannot start with smaller teams which can work on their own parts and what not. But, for the teams to work on their own parts, your bounded contexts should be very well defined, which is almost never true. The boundaries and the contexts become clear(er) once we have the system reasonably in place. I would even counter-argue that a lot of times, you would want to peel off a part of the system into a microservice much later when you get feedback from production data.

In our example, one of the services which generated PDF reports for the days’ trades was bundled along with the reporting service. Over a period we realized that this service was hit a lot and especially during the closing hours or the first hour after closing. Other reports did not follow the same behavior. We separated out the PDF service as its own microservice which could expand to 2x servers on peak load. The report service continued to work the way it was.

So, the summary is that however hard you may try, you would almost always fail at separating out the microservices when you begin the project. In fact, the cost of badly carved out bounded contexts and microservices would be much higher than beginning with a monolith and break it down as and when needed.

You might be interested in this post as well “And you thought you were doing microservices

Posted in Microservices, Scala | Tagged , | 3 Comments

Basic of The Gherkin Language

Hello Everyone ,

In this blog we will discuss about Gherkin Language  which we used in BDD for writing test cases.we will take a look on below topic.


Gherkin’s grammar is defined in the parsing expression grammars. It is Business Readable, DSL created specifically for behavior descriptions without explaining how that behaviour is implemented. Gherkin is a plain English text language.

Gherkin serves two purposes — documentation and automated tests. It is a whitespace-oriented language that uses indentation to define structure.

The Gherkin includes 60 different spoken languages so that we can easily use our own language.The parser divides the input into features, scenarios and steps.

Here is a simple example of Gherkin:

Screenshot from 2017-04-18 11-31-45

When we run this feature this gives us a step definition.In Gherkin, each line is start with a Gherkin keyword, followed by any text you like.

The main keywords are:

  • Feature
  • Scenario
  • Given
  • When
  • Then


This keyword refers to description of the functionality of application for which feature file is defines the single feature and we can add multiple scenarios under this.

In this example we added multiple scenario under single feature

Screenshot from 2017-04-18 22-49-12


A Scenario constituting an actual thing.It consists of a list of different steps which are group by Given,When,Then ,And.

We can include numbers of scenario in one feature.

Scenarios follow the same pattern which followed in BDD like

Firstly we go through an initial context, then an event and after that get the outcome.


Steps are starting with Given- When- Then with this we can also use And and But. it is our responsibility we should make our steps readable because system does not differentiate between the keywords.

Screenshot from 2017-04-18 13-51-25

Given: Describes the initial context of the system—the scene of the scenario i.e. configures the system to be in a well-defined state.

When: It is used to describe an event, or an action on the Given context, if any.

Then: It is used to describe an expected outcome, or result. We should use assertion to validate whether the actual result matches the expected result.

Hope the blog helps you.

Posted in Scala | Tagged , , , , | 3 Comments

Installing and running Kafka on AWS instance (CentOS)

In this blog we will install and start a single-node, latest and recommended version of kafka ie with the binary for Scala 2.12 on the EC2 Linux instance with centOS as its operating system. We would be using the t2.micro (free tier) instance which comes with 1 GB RAM and 8 GB SSD.

Prerequisite ->

1). Create an EC2 instance ->

Steps for creating an AWS instance are clearly mentioned in official AWS docs, check here.

2). Install java 8 ->

Since we would be working with the Kafka binary for Scala 2.12, our instance must have java 8. By default the ec2 instances have java 7. You may check and upgrade the java version to 8 on your instance by following the steps here.

After installing java 8, follow the steps mentioned below in sequence to start Kafka on your instance.

Step 1 -> Downloading and Extracting kafka

Download kafka_2.12-


Extract the .tgz file

tar -xzf kafka_2.12-

Since the zip is of no use now, we remove it :

rm kafka_2.12-

Step 2 -> Starting zookeeper

Since Kafka uses zookeeper, we need to first start a zookeeper server. We can use the convenience script packaged with Kafka to start a single-node zookeeper instance or we can start zookeeper on a standalone instance and specify its configurations in configuration file, we would be starting it using the convenience script that is packaged with Kafka. Since we have 1 GB RAM we would be setting KAFKA_HEAP_OPTS environment variable in our .bashrc to 50% of total RAM ie 500 MB in our case.

vi .bashrc

Insert following environment variable

export KAFKA_HEAP_OPTS="-Xmx500M -Xms500M"

After setting the variable, source your .baschrc

source .bashrc

Start Zookeeper by the following command in background using nohup and divert its logs in zookeeper-logs file

cd kafka_2.12-
nohup bin/ config/ > ~/zookeeper-logs &

Then press ctrl+d to log out of the instance.

Ssh to your instance again and check the content of zookeeper-logs file. It must look like :


NOTE -> In case the content of of zookeeper-logs file is different,  try freeing the RAM buffers/cache by following commands and re-run the cd and nohup command mentioned above (May happen as t2.micro instance comes with quite less RAM, unlikely to happen with bigger instances)

sudo sh -c 'echo 1 >/proc/sys/vm/drop_caches'
sudo sh -c 'echo 2 >/proc/sys/vm/drop_caches'
sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'

Step 3 -> Starting Kafka

After successfully staring Zookeeper its now time to start Kafka via following command.

cd kafka_2.12-
nohup bin/ config/ > ~/kafka-logs &

Then press ctrl+d to log out of the instance.

Ssh to your instance again and check the content of Kafka-logs file. It must look like :

This successfully starts Kafka on your ec2 instance. You may access your Kafka-server via Kafka-scala or Kafka-java api by making required changes in the security groups. To stop Kafka and zookeeper, enter following commands.


Hope the blog helps you. Comments and suggestions are welcomed.


Posted in Scala | 2 Comments