Cassandra Database : The Next Big Thing


Apache Cassandra, a top level Apache project born at Facebook , is a distributed database for managing large amounts of structured data across many commodity servers, while providing highly available service and no single point of failure.

BASIC FLOW OF DATA INTO CASSANDRA TABLES

write-path.png

Installation

In a terminal window:

1. Check which version of Java is installed by running the following command:

$ java -version

It is recommended to use the latest version of Oracle Java 8 or OpenJDK 8 on all nodes.
2. Download Apache Cassandra 3.0:

$ curl -L http://downloads.datastax.com/community/dsc-cassandra-version_number-bin.tar.gz | tar xz

To view the available versions, see the DataStax Certified Cassandra download page.

3. If you download from this page, use the following command to untar:

$ tar -xvzf dsc-cassandra-version_number-bin.tar.gz

4. To configure Cassandra, go to the install/conf directory:

$ cd dsc-cassandra-version_number/conf

Additional Information :

Continue reading

Posted in Scala | Tagged , | 1 Comment

TUTORIAL 3 :Using Tags in Cucumber


Hello Everyone, Now We will go through the TAGS in cucumber. You can read the previous post related to how to write a  Test Script in CUCUMBER : here

In chapter of Feature if we have many Scenarios , to put them under a single umbrella, we use tags in our cucumber through which we will be able to generate reports for specific scenarios under the same tag.

We can also provide multiple tags as values separated by commas as shown below .Tags are define in our runner class like this:

tag6

Continue reading

Posted in Scala | Tagged , , , | Leave a comment

Tutorial 4: Background in CUCUMBER


Hello Everyone,

In this Series we will try to understand that how to organize our scenarios using background. When we write  multiple scenarios within single feature file with repeated steps. Starting steps which are common in all the scenarios can be pulled out into a Background test steps.

Using background, we can make the feature file more readable and less complex in lieu of writing steps over and over again for each scenario.

Here is an example of background:

backgr

When we execute the feature, at run time, the steps in Background are executed in the beginning of each scenario.

Continue reading

Posted in Scala | Tagged , , , | Leave a comment

Deploying The Lagom Service On ConductR


In the previous blog, we have discussed how to create a Lagom service based architecture with a beautiful word count example. One can refer to the below link-

https://blog.knoldus.com/2017/03/27/lagom-framework-the-legacy-wordcount-example/

In this blog we will discuss now how we can deploy the Lagom service on conductR.

What is ConductR?

As described by lightbend –
“” ConductR is a “batteries included” approach to managing distributed systems. No more cobbling together of service gateways, service locators, consolidated logging, monitoring and so forth. All of these essential items and more are included with ConductR. In fact we want ConductR to be to operations what Play and Lagom are to developers; we want operations to be productive so that they can concentrate on keeping their business customers happy. “”

For more information on conductR refer :- http://conductr.lightbend.com/

The Below Mentioned Steps Will Create A ConductR Cluster And Will Deploy The Lagom Service .

Prerequisites

Docker (when using Docker based bundles) – Download the docker from <Here>

sbt  where( sbt is our interactive build tool. )

conductr-cli Download from <Here>

The conductr-cli is used to communicate with the ConductR cluster.

Adding sbt-conductr plugin

Continue reading

Posted in Scala | Leave a comment

Tutorial 2:Introduction on how to write a First Test Script in CUCUMBER


Hello Everyone,

In this series of blog it’s time to look into how to write test cases in cucumber and execution of these test cases.Follow these Steps for the Same.
You can read the previous post related to installation of cucumber: here

Step 1: We will use the test package to define the location of features (Resources folder), step definitions (Java folder) and other files.

Then Create a class CucumberRunnerTest in StepDefinitions package and it will look like:

first

Step 2: We will write a feature file for eg. Addition of Numbers.

Continue reading

Posted in Scala | Tagged , , , , , , , , , , | Leave a comment

Tutorial 1:Cucumber with java maven project


Hi Folks.

From this Blog we will start a series of cucumber BDD tool .Before go through the cucumber We should know how to integrate cucumber with java maven project.

To run Cucumber test with Java, following are the steps.

Step 1. Install Eclipse IDE -Make sure java should already be installed on your machine.

Step 2. Then Create the New Project in Eclipse IDE by following steps:

  • Click on new –> Other –> Maven –> Maven Project — > Next
  • After that click on Simple project and keep the default workspace location
  • Provide details as Artifact id, Group id, name and description. and click on Finish.

Step 3. Then Configure Cucumber with Maven.

  •  Open the pom.xml
  • Add dependency for Cucumber-Java, Cucumber-JUnit pom2

Step 4. Once pom.xml is edited successfully, save it. After adding dependencies of Cucumber, this is how pom.xml should look like:

Continue reading

Posted in Scala | Tagged , , , , , , , , , , | 1 Comment

Apache Solr with Java: Result Grouping with Solrj


This blog is a detailed, step-by-step guide on implementing group by field in Apache Solr using Solrj.

Note: Grouping is different from Faceting in Apache Solr. While grouping returns the documents grouped by the specified field, faceting returns the count of documents for each of the different values for the specified field. However you can combine grouping and faceting in Solr. This blog talks about grouping without the use of facet and implementing the same through Solrj (version 6.4.1).

Without much ado, let’s get to it.

First, you need a running Solr instance with correctly indexed data.

Note: A Solr core is basically an index of the text and fields found in documents. A single Solr instance can contain multiple “cores”, which are separate from each other based on local criteria. If you don’t have a running solr instance with a core set up, Apache Solr also provides a number of useful examples to help you learn about key features. You can launch the examples using the -e flag.

Set up your local solr, by following the directions below:

(i) Download Solr for your operating system from Apache Solr – Downloads

(ii) Go to the directory and start solr with the demo data provided by Apache Solr by running the following command in the solr directory:

bin/solr -e techproducts

Continue reading

Posted in Java | Tagged , , | Leave a comment

Data modeling in Cassandra


Role of Partitioning & Clustering Keys in Cassandra

Primary and Clustering Keys should be one of the very first things you learn about when modeling Cassandra data.  With this post I will cover what the different types of Primary Keys are, how they can be used, what their purpose is, and how they affect your queries.

Primary key

Primary Keys are defined when you create your table.  The most basic primary key is a single column.  A single column is great for when you know the value that you will be searching for.  The following table has a single column, comp_id, as the primary key,

CREATE TABLE company_Information (
comp_id text,
name text ,
city text,
state_province text,
country_code text,
PRIMARY KEY (comp_id )
);

A single column Primary Key is also called a Partition Key.  When Cassandra is deciding where in the cluster to store this particular piece of data, it will hash the partition key. The value of that hash dictates where the data will reside and which replicas will be responsible for it.

Partition Key

The Partition Key is responsible for the distribution of data among the nodes.  suppose there are 4 nodes A , B ,C ,D and  let’s assume hash values are between 0-100  and also assume that 0-25 , 25-50 , 50-75 and 75-100 are the hash values for the nodes A,B,C,D .   When we insert the first row into the company_Information table, the value of comp_id will be hashed.  Let’s also assume that the first record will have a hash of 34.  That will fall into the values that Node 2’s partition is assigned.

Compound Key

  • A multi-column primary key is called a Compound Key.
  • Primary keys can also be more than one column.


CREATE TABLE company_Information (
comp_id text,
name text ,
city text,
state_province text,
country_code text,
PRIMARY KEY (country_code , city , name , comp_id )
);

This example has four columns in the Primary Key clause. An interesting characteristic of Compound Keys is that only the first column is considered the Partition Key. There rest of the columns in the Primary Key clause are Clustering Keys.

Order By

You can change the default shorting order from ascending to descending by “order By” . There is an additional WITH clause that you need to add to the CREATE TABLE to make this possible.

CREATE TABLE company_Information (
comp_id text,
name text ,
city text,
state_province text,
country_code text,
PRIMARY KEY (country_code , city , name , comp_id )
) WITH CLUSTERING ORDER BY (city DESC, name ASC , comp_id DESC);

Now we’ve changed the ordering of the Clustering Keys to sort city in descending order . Did you notice that I did not specify what the sort is for country_code? Since it’s the partition key, there is nothing to sort as hashed values won’t be close to each other in the cluster.

Clustering Keys

Each additional column that is added to the Primary Key clause is called a Clustering Key. A clustering key is responsible for sorting data within the partition. In our example company_Information table, country_code is the partition key with city, name & comp_id acting as the clustering keys. By default, the clustering key columns are sorted in ascending order.

Composite Key

A Composite Key is when you have a multi-column Partition Key.  The above example only used country_code for partitioning.  This means that all records with a country_code value of “INDIA” are in the same partition.Avoiding wide rows is the perfect reason to move to a Composite Key.  Let’s change the Partition Key to include the comp_id & city columns.  We do this by nesting parenthesis around the columns that are to be a Composite Key, as follows:


CREATE TABLE company_Information (
comp_id text,
name text ,
city text,
state_province text,
country_code text,
PRIMARY KEY ((country_code , city , comp_id) , name )
);

What this does is it changes the hash value from being calculated off of only country_code. Now it will be calculated off of the combination of country_code, city & comp_id. Each combination of the three columns have their own hash value and will be stored in completely different partition in the cluster

Posted in Cassandra, Scala | Tagged , , | Leave a comment

Installing and Running Presto


Hi Folks !
In my previous blog, I had talked about Getting Introduced with Presto.
In today’s blog, I shall be talking about setting up(installing) and running presto.

The basic pre-requisites for setting up Presto are:


  • Linux or Mac OS X
  • Java 8, 64-bit
  • Python 2.4+

Installation


  1. Download the Presto Tarball from here
  2. Unpack the Tarball
  3. After unpacking you will see a directory presto-server-0.175 which we will call the installation directory.

Configuring


Inside the installation directory create a directory called etc. This directory will hold the following configurations :

  1. Node Properties: environmental configuration specific to each node
  2.  JVM Config: command line options for the Java Virtual Machine
  3. Config Properties: configuration for the Presto server
  4. Catalog Properties: configuration for Connectors (data sources)
  5. Log Properties : configuring the log levels

Now we will setup the above properties one by one.

Step 1 : Setting up Node Properties

Create a file called node.properties inside the etc folder. This file will contain the configuration specific to each node. Given below is description of of the properties we need to set in this file

  • node.environment: The name of the presto environment. All the nodes in the cluster must have identical environment name.
  • node.id: This is the unique identifier for every node.
  • node.data-dir: The path of the data directory.

Note : Presto will stores the logs and other data at the location specified in the node.data-dir.  It is recommended to create data directory external to the installation directory, this allows easy preservation during the upgrade.

Continue reading

Posted in big data, database, Scala | Tagged , , , , | Leave a comment

Avro Communication over TCP Sockets


Storing/Transferring object is a requirement of most applications. What if there is a need for communication between machine having incompatible architecture. Java Serialization won’t work for that. Now, if you are thinking about Serialization Framework then you are right. So, let’s start with one of the Serialization framework Apache Avro.

What is Avro?

Apache Avro is a language-neutral data serialization system. It’s a schema-based system which serializes the data having built-in schema into a compact binary format, post data can be deserialized by any application having the same schema.

In this post, I will demonstrate how to read schema by using parsers library and send/ receive the serialized data over java socket.

Let’s create a new maven project and add avro dependency in pom.xml

<dependency>
 <groupId>org.apache.avro</groupId>
 <artifactId>avro</artifactId>
 <version>1.8.1</version>
</dependency>

Now, create new avro schema file in schema directory of project say it as employee.avsc

{
 "namespace": "example.avro",
 "type": "record",
 "name": "employee",
 "fields": [
  {"name": "Name", "type": "string"},
  {"name": "id", "type": "string"}
 ]
}

here we are considering a schema of employee having name and id.

Instantiate the Schema.Parser class by passing the file path where the schema is stored to its parse method.

Schema schema = new Schema.Parser().parse(new File("src/schema/employee.avsc"));

After Schema parsing we need to create record using GenericData and store data using put method.

Continue reading

Posted in Scala | Tagged , | 1 Comment