apache

Kafka Streams

Interactive Queries in Apache Kafka

Apache Kafka v0.10 introduced a new feature Kafka Streams API – a client library which can be used for building applications and microservices, where the input and output data can be stored in Kafka clusters. Kafka Streams provides state stores, which can be used by stream processing applications to store and query data.  Every task in Kafka Streams uses one or more state stores which Continue Reading

Simple Things You Can Learn From Cassandra Nodetool (Monitor/Manage) For DC/OS

Cassandra native tool called nodetool is used for monitoring and managing cassandra cluster for dcos

Joins in Kafka

Join Semantics in Kafka Streams

Introduction to core concepts:   Apache Kafka is a distributed streaming platform which enables you to publish and subscribe to a stream of records also letting you process this stream of records as it occurs. Kafka Streams is a client library used for building applications and microservices, where the input and output data are stored in Kafka clusters. Interface KStream<K, V> is an abstraction of Continue Reading

fetching data from different sources using Spark 2.1

What’s new in Apache Spark 2.2

Apache recently released a newer version of Spark i.e Apache Spark 2.2. The new version comes with new improvements as well as the addition of new functionalities. The major addition to this release is Structured Streaming. It has been marked as production ready and its experimental tag has been removed. Some of the high-level changes and improvements : Production ready Structured Streaming Expanding SQL functionalities New Continue Reading

Apache Solr with Java: Result Grouping with Solrj

This blog is a detailed, step-by-step guide on implementing group by field in Apache Solr using Solrj. Note: Grouping is different from Faceting in Apache Solr. While grouping returns the documents grouped by the specified field, faceting returns the count of documents for each of the different values for the specified field. However you can combine grouping and faceting in Solr. This blog talks about grouping without the Continue Reading

Solr with Java: A basic hands-on with SolrJ

What is Apache Solr: Apache Solr is a search sever that includes the full-text search engine called Apache Lucene. It takes the piece of information (called documents) that are indexed according to the cores. When a query is performed, solr goes through the index and return the matching documents. Now let’s start the hands-on. Step 1: Install Solr from the following link. Step 2: Start Continue Reading

Introduction to Kafka Connect

Knoldus organized a half an hour session on 29 July 2016 at 4:00 PM. It covers a brief introduction to Apache Kafka Connect, giving insights about the benefits of kafka connect, its use cases. It also covers the motivation behind building Kafka Connect and an introduction to its architecture. Here is the video for the same.

Apache spark + cassandra: Basic steps to install and configure cassandra and use it with apache spark with example

To build an application using apache spark and cassandra you can use the datastax spark-cassandra-connector to communicate with spark. Before we are going to communicate with spark using connector we should know how to configure cassandra. So following are prerequisite to run example smoothly. Following steps to install and configure cassandra If you are new to cassandra first we nee to install cassandra on our Continue Reading

How to setup and use zookeeper in scala using Apache Curator

In order to use Zookeeper to manage your project’s configurations across the cluster, first we will setup the zookeeper ensemble on our local machine (setup is for testing on a single machine) by following these steps: 1) Download a stable zookeeper release 2) Unpack it at three places and rename it to: /home/user/Desktop/zookeeper1, /home/user/Desktop/zookeeper2, and /home/user/Desktop/zookeeper3 3) In order to use zookeeper we will need Continue Reading

%d bloggers like this: