Big Data and Fast Data

An Introduction to Kafka’s Internals

Reading Time: 6 minutes In this blog, we will get the opportunity to come across what Kafka is, and explain how Kafka works from the inside out.  How does it replicate data between nodes, what happens if replication fails, and how do consumers scale-out automatically? Insights of Apache Kafka Kafka is a statistics streaming system that permits builders to react to new activities as they arise in real-time. Kafka Continue Reading

Rubik’s Cube with code background

Getting started with Zio-Http

Reading Time: 6 minutes What is Zio? ZIO is a functional programming library for building concurrent and asynchronous applications in Scala. It provides a set of composable and type-safe abstractions for managing side effects, such as IO, error handling, and concurrency primitives like fibers, promises, and queues. ZIO is designed to make it easier to write correct and performant concurrent code by providing a more expressive and composable API Continue Reading

How advanced is data analytics transforming the retail industry?

Reading Time: 3 minutes Traditional brick-and-mortar retailers have been radically overhauled by data analytics, which has swept the industry off its feet. To assess consumer needs, enhance supply chain administration, and boost profit; it has introduced a new perspective. Additionally, it aims to optimize revenues by maximizing brand strategy, discount coupons, and ensuring that excess inventory loss is kept to a bare minimum.  Furthermore, data analytics aids in evaluating Continue Reading

Data Streaming with AWS Kinesis

Reading Time: 4 minutes Data is an essential asset for modern businesses as it helps them to monitor all aspects of the business. Every second we are processing, analysing and transforming a large amount of data. So the need for handling the dynamically generating data is important. As the number, variety, and velocity of data sources grow, new architectures and technologies are needed. This is where the need for data streaming Continue Reading

Developing programmer Development Website design and coding technologies working

MarkLogic Data Hub: For new data Ingestion

Reading Time: 3 minutes Introduction The MarkLogic Data Hub is a secure, scalable database platform. It is used to store and manage data across multiple cloud-based environments. It helps you create a virtualized database that may deploy in the cloud and on-premises. Perform day-to-day operations like creating reports or working with data directly in Excel or another spreadsheet application. The MarkLogic Data Hub provides many tools for getting started Continue Reading

MarkLogic & Hadoop: for ease of technology solutions

Reading Time: 4 minutes Introduction The MarkLogic Connector for Apache Hadoop is a powerful tool that allows you to use MapReduce. MarkLogic Platform to move large volumes of data into your Hadoop cluster. With this integration, you can leverage existing technology and processes for ETL. In addition, this connector enables you to take advantage of many advanced features available only in MarkLogic. ” MarkLogic, Hadoop, Hadoop Integration, why MarkLogic Continue Reading

Digital engineer working on virtual blueprint building

Big Query DML Statements Technique: A small Guide

Reading Time: 3 minutes In this blog we are going to learn about some of the key Big Query DML statements. Data plays an integral part in any organisation. With the data-driven nature of modern organisations, almost all businesses and their technological decisions are based on the available data. Let’s assume that we have an application distributed across multiple servers in different regions of a cloud service provider, and Continue Reading

How Kafka Relates to Axon Framework?

Reading Time: 3 minutes Axon and Kafka are used for different purposes, Axon is used for Event-Driven Architecture and provides the application-level support for domain modeling and Event Sourcing, as well as the routing of Commands and Queries, while Kafka serves as an Event Streaming system. The basic fundamental of Axon is to implement CQRS and Event Sourcing-based architecture.  With the help of this, we can design & develop Continue Reading

Kafka Connect Concepts

Reading Time: 5 minutes Kafka Connect is a framework to stream data into and out of Apache Kafka. A few major concepts. Connectors – the high-level abstraction that coordinates data streaming by managing tasks Tasks – the implementation of that how data is copied from Kafka Workers – the running processes that execute connectors and tasks Converters – the code used to translate data between Connect and the system sending or receiving data Continue Reading

Introduction to Apache Beam

Reading Time: 3 minutes What is Apache Beam? Apache Beam is a unified programming model for batch and streaming data processing jobs. It provides a software development kit to define and construct data processing pipelines as well as runners to execute them. Apache Beam is designed to give a portable programming layer. The Beam Pipeline Runners translate the data processing pipeline into the API compatible with the back-end of the user’s Continue Reading

Apache Beam ParDo Transformations

Reading Time: 2 minutes What is a PCollection? A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection form of Continue Reading

Kafka Connect example: Mysql to Elastic Search

Reading Time: 3 minutes Overview: Hello everyone, in this blog, we will see an example of Kafka connect in which we will take a MySQL table, stream it to a Kafka topic, and from there load it to Elasticsearch and index its content. Installation: Now first of all we will install MySQL and Elastic search to our local system. For installing simply run: The next step is to make Continue Reading

Apache Beam Core Transforms

Reading Time: 6 minutes Introduction Transform in Apache Beam are the operations in your pipeline, and provide a generic processing framework. You provide processing logic in the form of a function object (colloquially referred to as “user code”), and your user code is applied to each element of an input PCollection (or more than one PCollection). Core Beam transforms Beam provides the following core transforms, each of which represents a different processing Continue Reading