Search Results for: Apache

Introduction to Apache HttpClient

Reading Time: 5 minutes Introduction Apache HttpClient is a popular open-source library for sending HTTP requests and receiving HTTP responses in Java. It provides a rich set of features for building HTTP-based client applications, including support for authentication, connection pooling, request and response interception, and more. One of the key benefits of Apache HttpClient is its flexibility and configurability. You can customize almost every aspect of the HTTP request Continue Reading

Use-Cases of Apache HttpClient

Reading Time: 9 minutes This is Part 2 of an ongoing series of blogs explaining the use cases of Apache HttpClient(Part-1). This blog is going to take you a step ahead and help you know more about the various use cases such as Authentication, Connection Pooling, Cookie Management, GZIP Compression, Multithreading, Content-Encoding, Redirection, and Retries 1. Authentication It’s important to use secure connections (HTTPS) when sending sensitive information over Continue Reading

Apache Beam Vs Apache Airflow

Reading Time: 4 minutes The need to compare data tools and to keep hunting for the perfect one seems never-ending. In this blog, We’re going to see the comparison between the Apache Beam and Apache Airflow.  This blog helps you choose by looking into the differences and similarities between the two: Apache Airflow and Apache Beam.  Comparison Study On the surface, Apache Airflow and Apache Beam may look similar. Both are open-source, Continue Reading

Apache Beam Vs Apache Spark

Reading Time: 4 minutes Before going through the comparison of  Apache Beam and Apache Spark, we should have a glimpse of what these two exactly are. Apache Beam means a unified programming model. It implements batch and streaming data processing jobs that run on any execution engine. It executes pipelines in multiple execution environments. Apache Spark defines as a fast and general engine for large-scale data processing. Spark is a fast Continue Reading

Apache Camel vs Apache Kafka

Reading Time: 4 minutes An overview of Camel Apache Camel is an open source integration framework that targets the integration between different systems. Camel is a routing engine, more precisely a routing- engine builder. however It allows you to define your own routing rules, decide from which sources to accept messages, and determine how to process and send those messages to other destinations.However its Routes, Camel uses a set Continue Reading

Apache Beam Vs Apache Spark: A Quick Guide

Reading Time: 4 minutes Before we compare Apache Beam with Apache Spark, we must see what the two are. Apache Beam refers to an integrated planning model. It uses a lot of streaming data processing functions that work on any output engine. It uses pipes in many places of use. Apache Spark describes a fast and common data processing engine on a large scale. Spark is a fast and Continue Reading

Apache Beam: Introduction

Reading Time: 3 minutes Apache Beam is a unified programming model that handles both stream and batch data in the same way. We can create a pipeline in beam any of the following beam SDK’s (Python/Java/Go languages) which can run on top of any supported execution engine namely Apache Spark, Apache Flink, Apache Apex, Apache Samza, Apache Gearpump, and Google Cloud dataflow(there are many more to join in future). Continue Reading

Apache Superset with Existing Postgresql Instance

Reading Time: < 1 minute In this blog, we will learn to deploy & set up Apache Superset with PostgreSQL instance. Apache Superset is an open-source software application for data exploration and data visualization able to handle data at a petabyte scale.  Prerequisites A Postgresql Instance deployed on the same Kubernetes cluster Helm installed in system & basic knowledge of Helm. Admin username & password of PostgreSQL instance Superset Database created in PostgreSQL (If not Continue Reading

Introduction to Apache Beam

Reading Time: 3 minutes What is Apache Beam? Apache Beam is a unified programming model for batch and streaming data processing jobs. It provides a software development kit to define and construct data processing pipelines as well as runners to execute them. Apache Beam is designed to give a portable programming layer. The Beam Pipeline Runners translate the data processing pipeline into the API compatible with the back-end of the user’s Continue Reading

Apache Camel Exception Re-try Policy

Reading Time: 4 minutes Apache Camel is a rules-based routing and arbitration engine that provides a Java object-based implementation of the Enterprise Integration Pattern, using an API (or Declarative Java Domain Specific Language) to configure routing and arbitration rules. We can implement exception handling in two ways Using Do Try block and OnException block . A re-try policy defines rules when Camel Error Handler perform re-try attempts. e.g you can setup Continue Reading

Introduction to Apache Spark

Reading Time: 3 minutes Hey guys, welcome to this fresh blog on Apache Spark. In this blog, we’ll learn about what is Apache Spark and its importance in the industry, its comparison with hadoop, spark evolution, features and much more. What is Apache Spark? Apache Spark is a data processing framework that can quickly perform processing tasks on very large datasets. It can also distribute data processing tasks across multiple Continue Reading

Apache Beam ParDo Transformations

Reading Time: 2 minutes What is a PCollection? A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection form of Continue Reading

Apache Beam Core Transforms

Reading Time: 6 minutes Introduction Transform in Apache Beam are the operations in your pipeline, and provide a generic processing framework. You provide processing logic in the form of a function object (colloquially referred to as “user code”), and your user code is applied to each element of an input PCollection (or more than one PCollection). Core Beam transforms Beam provides the following core transforms, each of which represents a different processing Continue Reading