Category Archives: big data

Apache Hadoop vs Apache Spark


The term Big Data has created a lot of hype already in the business world. Hadoop and Spark are both Big Data frameworks – they provide some of the most popular tools used to carry out common Big Data-related tasks. … Continue reading

Posted in apache spark, big data, Scala | Tagged , , , , , | Leave a comment

One-way & two-way streaming in a Lagom application


Now a days streaming word is a buzz word and you should have heard many types of streaming till now i.e. kafka streaming, spark streaming etc etc. But in this blog we will see a new type of streaming i.e … Continue reading

Posted in Akka, Best Practices, big data, Functional Programming, github, Java, knoldus, Messages, Reactive, Scala, Streaming, Web Services | Leave a comment

Zeppelin with Spark


Let us first start with the very first question, What is Zeppelin? It is a web-based notebook that enables interactive data analytics. Based on the concept of an interpreter that can be bound to any language or data processing backend, … Continue reading

Posted in big data, Scala, Spark, Tutorial | 1 Comment

Apache Storm: Architecture


Apache Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing the realtime computation. Storm is simple, can be used … Continue reading

Posted in big data, Clojure, Scala, Streaming | 1 Comment

Case Study to understand Kafka Consumer and its offsets


In this blog post, we will discuss mainly Kafka Consumer and its Offsets. We will understand this using a case study implemented in Scala. This blog post assumes that you are aware of basic Kafka terminology. CASE STUDY: The Producer … Continue reading

Posted in Apache Kafka, big data, Functional Programming, knoldus, Scala, Streaming | 3 Comments

Simple Things You Can Learn From Cassandra Nodetool (Monitor/Manage) For DC/OS


Cassandra native tool called nodetool is used for monitoring and managing cassandra cluster for dcos Continue reading

Posted in Best Practices, big data, Cassandra, cluster, NoSql | Tagged , , , , , , , , , , , , , | 2 Comments

Knolx: Getting started with Presto


Hi all, Knoldus has organized a 1-hour session on 8th September 2017. The topic was “Getting started with Presto”. Many people have joined and enjoyed the session. I am going to share the slides here. Please let me know if you … Continue reading

Posted in big data, Scala, sql | Leave a comment

What’s new in Apache Spark 2.2


Apache recently released a newer version of Spark i.e Apache Spark2.2. The new version comes with new improvements as well as the addition of new functionalities. The major addition to this release is Structured Streaming. It has been marked as production … Continue reading

Posted in apache spark, big data, Scala, Spark, Streaming | Tagged , , , , , , , , , , | 3 Comments

Deep Dive into Spark Cluster Managers


This blog aims to dig into the different Cluster Management modes in which you can run your spark application. Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program which … Continue reading

Posted in apache spark, big data, Scala, Spark | 1 Comment

Welcome to the world of Riak Database !!!


Today we are going to discuss the Riak Database which is distributed NoSQL Database. In the current scenario, when there are a lot of data into the world, we can not go for the old technology for storing the data. … Continue reading

Posted in big data, database, NoSql, Scala | Tagged | 1 Comment