Today we are going to discuss that how can we use the Scala with the Cockroach DB? As we all know that Cockroach DB is a distributed SQL database built on top of a transactional and consistent key-value store and now we are going to use it with the Scala. But before starting the journey, To those who have caught the train late,😉 this is what has happened till now:
- An Introduction to CockroachDB !!
Now before starting the code please setup Cockroach DB on your local environment.
We have to follow these steps for setting up CockroachDB in your local environment.
- We can download the Cockroach DB from here and follow the instruction which mentions there.
- Now run the following commands for starting the Nodes:
A few days ago, I came across a situation where I wanted to do a stateful operation on the streaming data. So, I started finding possible solutions for it. I came across many solutions which were using different technologies like Spark Structured Streaming, Apache Flink, Kafka Streams, etc.
All the solutions solved my problem, but I selected Kafka Streams because it met most of my requirements. After that, I started reading its documentation and trying to run its examples. But, as soon as I started learning it, I hit a major roadblock, that was, “Kafka Streams does not provide a Scala API!“. I was shocked to know that.
The reason I was expecting Kafka Streams to have a Scala API was that I am using Scala to build my application and if Kafka Streams provided an API for it then it would have been easy for me to include it in my application. But that didn’t turn out to be the case. Over the top when I searched for its Scala examples, I was able to find only a handful of them.
Whenever we hear the word Kafka, all we think about it as a messaging system with a publisher-subscriber model that we use for our streaming applications as a source and a sink.
So we can say that Kafka is just a dumb storage system that stores the data provided by a producer for a long time (configurable) and it can provide it to some consumer whenever one asks for data (from a topic of course).
Now between consuming the data from producer and sending it to the consumer, we can’t do anything on this data in Kafka. Then, we make use of other tools like Spark or Storm to process the data in between producer and consumer. In this way we have to build two separate clusters for our app: one for our Kafka cluster that stores our data; another one is to do stream processing on our data.
So to save ourselves from this hassle, Kafka Streams API comes to our rescue. With this, we have a Unified Kafka where we can set our stream processing inside Kafka cluster. And with this tight integration, we get all the support from Kafka (for example topic partition becomes stream partition for parallel processing).
What’s KAFKA STREAMS API?
The Kafka Streams API allows you to create real-time applications that power your core business. It is the easiest yet the most powerful technology to process data stored in Kafka. It gives us the implementation of standard classes of Kafka.
A unique feature of the Kafka Streams API is that the applications you build with it are normal applications. These applications can be packaged, deployed, and monitored like any other application – there is no need to install separate processing clusters or similar special-purpose and expensive infrastructure!
Link to the image
Before Starting it you should know about kafka, spark and what is Real time processing of Data.so let’s do some brief introduction about it.
Real Time Processing – Processing the Data that appears to take place instead of storing the data and then processing it or processing the data that stored somewhere else.
Kafka – Kafka is the maximum throughput of data from one end to another . it uses a concept of producer and consumer for producing and consuming the data. producer sends the data into topics that’s put on Kafka cluster and consumer subscribes the data from these topics. you can read about more Kafka here
Spark – spark is an open source processing engine built around speed, ease of use, and analytic. If you have large amounts of data that requires low latency processing that then Spark is the way to go. you can read about spark here
Spark Streaming – Spark Streaming is an extension of the core Spark API that enables processing of live data streams. Spark Streaming provides a high-level abstraction called discretized stream or DStream, which represents a continuous stream of data.
Python is an easy to learn, powerful programming language and object-oriented programming language created by Guido van Rossum. It wasn’t named after a dangerous snake 😛 . Rossum was the fan of a comedy series from the late seventies. The name “Python” was adopted from the same series “Monty Python’s Flying Circus”.
Everything in Python is an object. Sites like Mozilla, Reddit and Instagram are written in Python.
- Simple Elegant Syntax – It is easier to understand and write python code.
Introduction: Ansible is a configuration management and provisioning tool, similar to Chef, Puppet or Salt.Configuration management systems are designed for controlling large numbers of servers easy for administrators and operations teams. They allow you to control many different systems in an automated way from one central location.
There are many popular configuration management systems available for Linux systems, such as Chef and Puppet, these are often more complex than many people want or need. Ansible is written in Python and uses SSH to execute commands on different machines. Ansible uses YML to describe work.
Install And Configure Ansible on Ubuntu: Run the below command to install and configure ansible on Ubuntu.
sudo yum install ansible
We’ll assume you are using SSH keys for authentication. To set up SSH agent to avoid retyping passwords, you can run the below command.
Configuring Ansible Hosts: Ansible keeps track of all of the servers that it knows about through a “hosts” file. We need to set up this file first before we can begin to communicate with our other computers.Open the file with root privileges like this:
sudo vi /etc/ansible/hosts
In this blog, we will focus on What’s New in Java 8 and it’s usage in a simple and intuitive way.We assume that you are already familiar with Java 7.
If you want to run programs in Java 8, you will have to setup Java 8 environment by following steps :
- Download JDK8 and install it. Installation is simple like other java versions. JDK installation is required to write, compile and run the program in Java.
- Download latest Eclipse IDE/IntelliJ , these provide support for java 8 now. Make sure your projects build path is using Java 8 library.
There are dozens of features added to Java 8, the most significant ones are mentioned below . Let’s Begin Discussing Each in Detail :
In this blog, I will demonstrate how your application can support different languages using Play Framework 2.6.0
What is Application/Website Internationalization ?
Application/Website Internationalization can be defined as a process of developing and designing an application that supports not only single language but also different languages so that it can be easily adapted by the users from any language, region, or geography. It ensures that the code base of your application is flexible enough to serve a new audience without rewriting the complete code or keeps text separate from the code base.
Let us start the implementation step by step:
1. Specifying Languages for your application
In order to specify Languages for your application, you need Language tags, which are specially formatted strings that indicate specific languages such as “en” for English, “fr” for French, a specific regional dialect of a language such as “en-AU” for English as used in Australia.
First, you need to specify the languages in the conf/application.conf file, Languages tags will be used to create play.api.i18n.Lang instances.
play.i18n.langs = [“en”, “fr”]
Reason For Writing This Blog is That I tried to use Vectorized Reader In Hive But Faced some problem with its documentation,thats why decided to write this block
Vectorized query execution is a Hive feature that greatly reduces the CPU usage for typical query operations like scans, filters, aggregates, and joins. A standard query execution system processes one row at a time. This involves long code paths and significant metadata interpretation in the inner loop of execution. Vectorized query execution streamlines operations by processing a block of 1024 rows at a time. Within the block, each column is stored as a vector (an array of a primitive data type). Simple operations like arithmetic and comparisons are done by quickly iterating through the vectors in a tight loop, with no or very few function calls or conditional branches inside the loop
Enabling vectorized execution
To use vectorized query execution, you must store your data in ORC format Plus
set hive.vectorized.execution.enabled = true ;
How To Query
To use vectorized query execution, you must store your data in ORC format,
just follow the below steps
- start hive cli and create orc table with some data
hive> create table vectortable(id int) stored as orc;
Time taken: 0.487 seconds
hive>set hive.vectorized.execution.enabled = true;
hive> insert into vectortable values(1);
Query ID = hduser_20170713203731_09db3954-246b-4b23-8d34-1d9d7b62965c
Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Job running in-process (local Hadoop)
2017-07-13 20:37:33,237 Stage-1 map = 100%, reduce = 0% Ended Job = job_local722393542_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver. Moving data to: hdfs://localhost:54310/user/hive/warehouse/vectortable/.hive-staging_hive_2017-07-13_20-37-31_172_3262390557269287245-1/-ext-10000
Loading data to table default.vectortable Table default.vectortable stats: [numFiles=1, numRows=1, totalSize=199, rawDataSize=4]
MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 321 HDFS Write: 545 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 2.672 seconds