Big Data and Fast Data

Developing programmer Development Website design and coding technologies working

MarkLogic Data Hub: For new data Ingestion

Reading Time: 3 minutes Introduction The MarkLogic Data Hub is a secure, scalable database platform. It is used to store and manage data across multiple cloud-based environments. It helps you create a virtualized database that may deploy in the cloud and on-premises. Perform day-to-day operations like creating reports or working with data directly in Excel or another spreadsheet application. The MarkLogic Data Hub provides many tools for getting started Continue Reading

MarkLogic & Hadoop: for ease of technology solutions

Reading Time: 4 minutes Introduction The MarkLogic Connector for Apache Hadoop is a powerful tool that allows you to use MapReduce. MarkLogic Platform to move large volumes of data into your Hadoop cluster. With this integration, you can leverage existing technology and processes for ETL. In addition, this connector enables you to take advantage of many advanced features available only in MarkLogic. ” MarkLogic, Hadoop, Hadoop Integration, why MarkLogic Continue Reading

Digital engineer working on virtual blueprint building

Big Query DML Statements Technique: A small Guide

Reading Time: 3 minutes In this blog we are going to learn about some of the key Big Query DML statements. Data plays an integral part in any organisation. With the data-driven nature of modern organisations, almost all businesses and their technological decisions are based on the available data. Let’s assume that we have an application distributed across multiple servers in different regions of a cloud service provider, and Continue Reading

How Kafka Relates to Axon Framework?

Reading Time: 3 minutes Axon and Kafka are used for different purposes, Axon is used for Event-Driven Architecture and provides the application-level support for domain modeling and Event Sourcing, as well as the routing of Commands and Queries, while Kafka serves as an Event Streaming system. The basic fundamental of Axon is to implement CQRS and Event Sourcing-based architecture.  With the help of this, we can design & develop Continue Reading

Kafka Connect Concepts

Reading Time: 5 minutes Kafka Connect is a framework to stream data into and out of Apache Kafka. A few major concepts. Connectors – the high-level abstraction that coordinates data streaming by managing tasks Tasks – the implementation of that how data is copied from Kafka Workers – the running processes that execute connectors and tasks Converters – the code used to translate data between Connect and the system sending or receiving data Continue Reading

Introduction to Apache Beam

Reading Time: 3 minutes What is Apache Beam? Apache Beam is a unified programming model for batch and streaming data processing jobs. It provides a software development kit to define and construct data processing pipelines as well as runners to execute them. Apache Beam is designed to give a portable programming layer. The Beam Pipeline Runners translate the data processing pipeline into the API compatible with the back-end of the user’s Continue Reading

Apache Beam ParDo Transformations

Reading Time: 2 minutes What is a PCollection? A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection form of Continue Reading

Kafka Connect example: Mysql to Elastic Search

Reading Time: 3 minutes Overview: Hello everyone, in this blog, we will see an example of Kafka connect in which we will take a MySQL table, stream it to a Kafka topic, and from there load it to Elasticsearch and index its content. Installation: Now first of all we will install MySQL and Elastic search to our local system. For installing simply run: The next step is to make Continue Reading

Apache Beam Core Transforms

Reading Time: 6 minutes Introduction Transform in Apache Beam are the operations in your pipeline, and provide a generic processing framework. You provide processing logic in the form of a function object (colloquially referred to as “user code”), and your user code is applied to each element of an input PCollection (or more than one PCollection). Core Beam transforms Beam provides the following core transforms, each of which represents a different processing Continue Reading

Deep dive into Kafka Connect

Reading Time: 6 minutes Hello! In this article we will continue our journey of understanding Kafka Connect. We will try to understand the architecture and internals of it. We’ve seen that Kafka Connect is a pluggable component that helps to fed data into or from Kafka and hence provides flexible integration pipelines. It is inherently fault tolerant and sacalable. To work with any software component and get the most Continue Reading

Deploy modes in Apache Spark

Reading Time: 2 minutes Spark is an open-source framework engine that has high-speed and easy-to-use nature in the field of big data processing and analysis. Spark has some built-in modules for graph processing, machine learning, streaming, SQL, etc. The spark execution engine supports in-memory computation that makes it faster and cyclic data flow and it can run either on cluster mode or standalone mode and can also access diverse Continue Reading

How to implement Data Pipelines with the help of Beam

Reading Time: 4 minutes Throughout this blog, I will provide a deeper look into this specific data processing model and explore its data pipeline structures and how to process them. Apache Beam Apache Beam is one of the latest projects from Apache, a consolidated programming model for expressing efficient data processing pipelines. It is an open-source, unified model for defining both batches- and streaming-data parallel-processing pipelines. The Apache Beam programming model Continue Reading

The ecosystem of Apache Spark

Reading Time: 4 minutes Apache Spark is a powerful alternative to Hadoop MapReduce, with several, rich functionality features, like machine learning, real-time stream processing, and graph computations. It is an open-source distributed cluster-computing framework. It is designed to cover a wide range of workloads such as batch applications, iterative algorithms, interactive queries, and streaming. Apart from supporting all these workloads in a respective system. It reduces the management burden of Continue Reading