Big Data and Fast Data

Kafka Connect Concepts

Reading Time: 5 minutes Kafka Connect is a framework to stream data into and out of Apache Kafka. A few major concepts. Connectors – the high-level abstraction that coordinates data streaming by managing tasks Tasks – the implementation of that how data is copied from Kafka Workers – the running processes that execute connectors and tasks Converters – the code used to translate data between Connect and the system sending or receiving data Continue Reading

Introduction to Apache Beam

Reading Time: 3 minutes What is Apache Beam? Apache Beam is a unified programming model for batch and streaming data processing jobs. It provides a software development kit to define and construct data processing pipelines as well as runners to execute them. Apache Beam is designed to give a portable programming layer. The Beam Pipeline Runners translate the data processing pipeline into the API compatible with the back-end of the user’s Continue Reading

Apache Beam ParDo Transformations

Reading Time: 2 minutes What is a PCollection? A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection form of Continue Reading

Kafka Connect example: Mysql to Elastic Search

Reading Time: 3 minutes Overview: Hello everyone, in this blog, we will see an example of Kafka connect in which we will take a MySQL table, stream it to a Kafka topic, and from there load it to Elasticsearch and index its content. Installation: Now first of all we will install MySQL and Elastic search to our local system. For installing simply run: The next step is to make Continue Reading

Apache Beam Core Transforms

Reading Time: 6 minutes Introduction Transform in Apache Beam are the operations in your pipeline, and provide a generic processing framework. You provide processing logic in the form of a function object (colloquially referred to as “user code”), and your user code is applied to each element of an input PCollection (or more than one PCollection). Core Beam transforms Beam provides the following core transforms, each of which represents a different processing Continue Reading

Deep dive into Kafka Connect

Reading Time: 6 minutes Hello! In this article we will continue our journey of understanding Kafka Connect. We will try to understand the architecture and internals of it. We’ve seen that Kafka Connect is a pluggable component that helps to fed data into or from Kafka and hence provides flexible integration pipelines. It is inherently fault tolerant and sacalable. To work with any software component and get the most Continue Reading

Deploy modes in Apache Spark

Reading Time: 2 minutes Spark is an open-source framework engine that has high-speed and easy-to-use nature in the field of big data processing and analysis. Spark has some built-in modules for graph processing, machine learning, streaming, SQL, etc. The spark execution engine supports in-memory computation that makes it faster and cyclic data flow and it can run either on cluster mode or standalone mode and can also access diverse Continue Reading

How to implement Data Pipelines with the help of Beam

Reading Time: 4 minutes Throughout this blog, I will provide a deeper look into this specific data processing model and explore its data pipeline structures and how to process them. Apache Beam Apache Beam is one of the latest projects from Apache, a consolidated programming model for expressing efficient data processing pipelines. It is an open-source, unified model for defining both batches- and streaming-data parallel-processing pipelines. The Apache Beam programming model Continue Reading

The ecosystem of Apache Spark

Reading Time: 4 minutes Apache Spark is a powerful alternative to Hadoop MapReduce, with several, rich functionality features, like machine learning, real-time stream processing, and graph computations. It is an open-source distributed cluster-computing framework. It is designed to cover a wide range of workloads such as batch applications, iterative algorithms, interactive queries, and streaming. Apart from supporting all these workloads in a respective system. It reduces the management burden of Continue Reading

What is the Internet of thing testing?

Reading Time: 3 minutes Hello folks, in this blog we will check What is IoT and What is the Internet of thing testing? Nowadays IoT is a trending word and we all come across this word often. The full form of IoT is the “Internet of Thing”. We are encountering numerous IoT applications these days. The concern every time raises What is the Internet of thing testing? IoT devices Continue Reading

Apache Kafka Connect – Basic Introduction

Reading Time: 3 minutes We use Apache Kafka Connect for streaming data between Apache Kafka and other systems, scalably as well as reliably. Moreover, connect makes it very simple to quickly define Kafka connectors that move large collections of data into and out of Kafka. Kafka Connect collects metrics or takes the entire database from application servers into Kafka Topic. It can make available data with low latency for Continue Reading

Streaming Kafka Messages to Google Cloud Pub/Sub

Reading Time: 3 minutes In this blog post i present an example that creates a pipeline to read data from a single topic/multiple topics from Apache Kafka and write data into a topic in Google Pub/Sub. The example provides code samples to implement simple yet powerful pipelines.also provides an out-of-the-box solution that you can just ” compatiable.This consicutive example is build in Apache Beam.And it can be downloaded here.So, we hope you will find this Continue Reading

Getting started with Confluent Hub

Reading Time: 4 minutes As we know, We use connectors to copy data between Apache Kafka and other systems that we want to fetch or send data to. We can download the connector from Confluent Hub. So, in this blog, we will see how we can set up Confluent Hub on our local system. And how we can run confluent services like Kafka, zookeeper, schema registry, etc, etc. The first primary step is, if we have a windows Continue Reading