apache

nifi

Apache Nifi – The Ingestion tool

Reading Time: 3 minutes What is Apache NiFi ? Apache Nifi is an open source software for automating and managing the data flow between systems, which Leveraging the concept of Extract,Transform and Load. Apache Nifi a powerful as well as reliable system to process and distribute data. Additionally Apache Nifi has a web-based user interface for design, control, feedback, and monitoring of dataflows. History of Apache NiFi Based on Continue Reading

Apache Cassandra: CQL Commands

Reading Time: 4 minutes In previous two blogs of Apache Cassandra series, we have already explained the Basics of Apache Cassandra and How Cassandra Reads and Writes. Now here in this blog we will cover another important topic in Apache Cassandra i.e., CQL commands. So let us name this blog as “Apache Cassandra: CQL commands“. We recommend to go through the other two blogs of this series before diving Continue Reading

Log4j CVE-2021-45105: All we know is WRONG!!

Reading Time: 3 minutes Apache security team disclosed a third Log4j2 vulnerability the night between Dec 17 and 18 by the Apache security team. This vulnerability is termed CVE-2021-45105. According to the security advisory, 2.16.0, which fixed the two previous vulnerabilities, is susceptible to a DoS attack caused by a Stack-Overflow in Context Lookups in the configuration file’s layout patterns. What is this CVE about? What can you do Continue Reading

Apache Airflow – A Workflow Manager

Reading Time: 4 minutes As the industry is becoming more data driven, we need to look for a couple of solutions that would be able to process a large amount of data that is required. A workflow management system provides an infrastructure for the set-up, performance and monitoring of a defined sequence of tasks, arranged as a workflow application. Workflow management has become such a common need that most Continue Reading

Scoverage Analysis | Scala | SBT

Reading Time: 3 minutes Scoverage… what it is, how to use it and for which build tool it is available. So, In this blog we are gonna discussing all these along with its implementation in SBT. What is scoverage ? “scoverage” is an Apache’s free licensed code coverage tool for Scala language that put forward the statement and branch coverage. It is available for SBT, Maven, and Gradle. Advantage Continue Reading

Reading Avro files using Apache Flink

Reading Time: 2 minutes In this blog, we will see how to read the Avro files using Flink. Before reading the files, let’s get an overview of Flink. There are two types of processing – batch and real-time. Batch Processing: Processing based on the data collected over time. Real-time Processing: Processing based on immediate data for an instant result. Real-time processing is in demand and Apache Flink is the Continue Reading

Using Apache Flink for Kinesis to Kafka Connect

Reading Time: 3 minutes In this blog, we are going to use kinesis as a source and kafka as a consumer. Let’s get started. Step 1: Apache Flink provides the kinesis and kafka connector dependencies. Let’s add them in our build.sbt: Step 2: The next step is to create a pointer to the environment on which this program runs. Step 3: Setting parallelism of x here will cause all Continue Reading

Writing Java APIs using Apache Atlas Client

Reading Time: 2 minutes In the previous blog, Data Governance using Apache ATLAS we discussed the advantages and use cases of using Apache Atlas as a data governance tool. In continuation to it, we will be discussing on building our own Java APIs which can interact with Apache Atlas using Apache atlas client to create new entities and types in it. How to create new Entities and Types using Continue Reading

KSQL: Getting started with Streaming SQL for Apache Kafka

Reading Time: 3 minutes KSQL is a SQL streaming engine for Apache Kafka which puts the power of stream processing into the hands of anyone who knows SQL. In this blog, we shall understand the basics of KSQL and how to get it up and running it in the easiest way on your local machines. What is KSQL? KSQL is a is distributed, scalable, reliable, and real time SQL Continue Reading

Hands-on: Apache Kafka with Scala

Reading Time: 4 minutes Apache Kafka is an open sourced distributed streaming platform used for building real-time data pipelines and streaming applications. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies. Before the introduction of Apache Kafka, data pipleines used to be very complex and time-consuming. A separate streaming pipeline was needed for every consumer. You can guess the complexity of it with Continue Reading

Exactly-Once Semantics with Apache Kafka

Reading Time: 4 minutes Kafka’s exactly once semantics was recently introduced with the version 0.11 which enabled the message being delivered exactly once to the end consumer even if the producer retries to send the messages. This major release raised many eyebrows in the community as people believed that this is not mathematically possible in distributed systems. Jay Kreps, Co-founder on Confluent, and Co-creator of Apache Kafka explained its Continue Reading

Kafka Streams

Interactive Queries in Apache Kafka

Reading Time: 4 minutes Apache Kafka v0.10 introduced a new feature Kafka Streams API – a client library which can be used for building applications and microservices, where the input and output data can be stored in Kafka clusters. Kafka Streams provides state stores, which can be used by stream processing applications to store and query data.  Every task in Kafka Streams uses one or more state stores which Continue Reading