Database

Knolx : MongoDB – Replication And Sharding

Reading Time: < 1 minute Hello all, Knoldus organised a knolx session on the topic “MongoDB – Replication And Sharding” on Friday, 4th november and  11th November 2016. MongoDB  is a free and open-source cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with schemas. MongoDB is developed by MongoDB Inc. and is free and open-source, published under a combination of the GNU Affero General Continue Reading

Cassandra Data Modeling – Primary , Clustering , Partition , Compound Keys

Reading Time: 5 minutes In this post we are going to discuss more about different keys available in Cassandra . Primary key concept in Cassandra is different from Relational databases. Therefore it is worth spending time to understand this concept. Lets take an example and create a student table which had a student_id as a primary key column. 1) primary key  create table person (student_id int primary key, fname Continue Reading

Apache Cassandra

Knolx: Introduction to Apache Cassandra

Reading Time: < 1 minute Hello everyone, Knoldus organized a KnolX session on Friday, 07 October 2016. In that session, we had an introductory session on Apache Cassandra. Cassandra is a distributed database which allows us to store data on multiple nodes with multiple replicas in such a way that even if a node goes down, another node can take charge of that node. The slides for the session are as Continue Reading

Cassandra with Spark 2.0 : Building Rest API !

Reading Time: 3 minutes In this tutorial , we will be demonstrating how to make a REST service in Spark using Akka-http as a side-kick  😉  and Cassandra as the data store. We have seen the power of Spark earlier and when it is combined with Cassandra in a right way it becomes even more powerful. Earlier we have seen how to build Rest Api on Spark and Couchbase Continue Reading

Neo4j With Scala: Awesome Experience with Spark

Reading Time: 4 minutes Lets start our journey again in this series. In the last blog we have discussed about the Data Migration from the Other Database to the Neo4j. Now we will discuss that how can we combine Neo4j with Spark? Before starting the blog here is recap : Getting Started Neo4j with Scala : An Introduction Neo4j with Scala: Defining User Defined Procedures and APOC Neo4j with Continue Reading

Neo4j With Scala : Migrate Data From Other Database to Neo4j

Reading Time: 6 minutes Hello Folks; Lets continue the Neo4j with Scala. We have earlier discuss about the use of Neo4j with Scala and Neo4j APOC with Scala. In this blog we are going to discuss about how we can migrate data from the other database like MYSQL, PostgreSQL, Oracle and Cassandra. But before starting the journey, To those who have caught the train late 😉 , this is Continue Reading

Apache PIG : Installation and Connect with Hadoop Cluster

Reading Time: 4 minutes Apache PIG, It is a scripting platform for analyzing the large datasets. PIG is a high level scripting language which work with the Apache Hadoop. It enables workers to write complex transformation in simple script with the help PIG Latin. Apache PIG directly interact with the data in Hadoop cluster. Apache PIG transform Pig script into the MapReduce jobs so it can execute with the Continue Reading

Hive-Metastore : A Basic Introduction

Reading Time: 3 minutes As we know database is the most important and powerful part for any organisation. It is the collection of Schema, Tables, Relationships, Queries and Views. It is an organized collection of data. But can you ever think about these question – How does database manage all the tables? How does database manage all the relationship? How do we perform all operations so easy? Is there Continue Reading

Building Analytics Engine Using Akka, Kafka & ElasticSearch

Reading Time: 5 minutes In this blog , I will share my experience on building scalable, distributed and fault-tolerant  Analytics engine using Scala, Akka, Play, Kafka and ElasticSearch. I would like to take you through the journey of  building an analytics engine which was primarily used for text analysis. The inputs were structured, unstructured and semi-structured data and we were doing a lot of data crunching using it. The Analytics Continue Reading

TITAN DB SETUP WITH CASSANDRA

Reading Time: 4 minutes TITAN DB SETUP WITH CASSANDRA CONNECT AND CONFIGURATION OF TITAN-DB WITH CASSANDRA:- Step 1: Download Cassandra Version: apache-cassandra-3.5 . Downlaod TitanDB Version: titan-1.0.0-hadoop1. Step 2 : Extract both downloads ,say in :- /var/lib/cassandra and /var/lib/titan respectively. Step 3: Configure cassandra: If you’ve installed Cassandra with a deb or rpm package, the directories that Cassandra will use should already be created an have the correct permissions. Otherwise, Continue Reading

Testability of Database Applications

Reading Time: 2 minutes Testability Testability is a non-functional requirement important to the testing team members and the users who are involved in user acceptance testing. Testability is mainly depends on the degree on which software team works like SRS (software requirement specification) FRD (Functional Design document) Software system, etc. Support testing for the software application under test. Minimum number of test cases that covers the entire testing scope Continue Reading

How to setup Cassandra Phantom driver for Scala

Reading Time: 2 minutes Phantom is a high performance Scala DSL for Apache Cassandra and now the leading tool for integrating Cassandra in the Scala Eco-system. So, if you are planning on using Cassandra with Scala, phantom is the weapon of choice. It has slowly but surely grown to become an established Scala framework and through its high focus on: Scala type system to mimic CQL and translate them Continue Reading

Best Practices for Using Slick on Production

Reading Time: 5 minutes Slick is most popular library for relational database access in Scala ecosystem. When we are going to use Slick for production , then some questions arise  like where should the mapping tables be defined and how to join with other tables, how to write unit tests. Apart from this there is lack of clarity on the design guidelines. In this blog post , I am Continue Reading