Author: Jyoti Sachdeva

Running Apache Airflow DAG with Docker

Reading Time: 3 minutes In this blog, we are going to run the sample dynamic DAG using docker. Before that, let’s get a quick idea about the airflow and some of its terms. What is Airflow? Airflow is a workflow engine which is responsible for managing and scheduling running jobs and data pipelines. It ensures that the jobs are ordered correctly based on dependencies and also manages the allocation Continue Reading

Git first cover

Git Rebase vs Pull

Reading Time: 4 minutes We face situations daily where we have to choose between pull and rebase to update the local code with the origin. We will see the difference using an example. Let’s say we have a master branch and it has only one file Demo.txt. We add m1 to it and commit it. Later add m2 and commit it and finally add m3 and commit it. master Continue Reading

Use of Either in Scala

Reading Time: 3 minutes In this blog, we are going to see the use of Either in scala. We use Options in scala but why do we want to go for Either? Either is a better approach in the respect that if something fails we can track down the reason, which in Option None case is not possible.We simply pass None but what is the reason we got None Continue Reading

Reading Avro files using Apache Flink

Reading Time: 2 minutes In this blog, we will see how to read the Avro files using Flink. Before reading the files, let’s get an overview of Flink. There are two types of processing – batch and real-time. Batch Processing: Processing based on the data collected over time. Real-time Processing: Processing based on immediate data for an instant result. Real-time processing is in demand and Apache Flink is the Continue Reading

Using Apache Flink for Kinesis to Kafka Connect

Reading Time: 3 minutes In this blog, we are going to use kinesis as a source and kafka as a consumer. Let’s get started. Step 1: Apache Flink provides the kinesis and kafka connector dependencies. Let’s add them in our build.sbt: Step 2: The next step is to create a pointer to the environment on which this program runs. Step 3: Setting parallelism of x here will cause all Continue Reading

Build your first web application using Django

Reading Time: 3 minutes In our previous blog Introduction to Django, we discussed the Django’s features and architecture. In this blog, we will create a web application in Django. For starting a new project, go to the folder where you want your project to be and run the command: django-admin startproject django_proj django-admin Django’s command-line utility for administrative tasks.manage.py is automatically created in each Django project. manage.py does the Continue Reading

List in Python

Reading Time: 4 minutes In this blog, we are going to discuss the list data structure of python. The list is a collection which is: • Ordered : [1,2] is not equal to [2,1] • Allows duplicate members: [1, 1] is allowed. • Mutable: allows modification of elements of the list after its creation. • Dynamic: allows addition, modification or deletion of elements. The differentiating feature between arrays from Continue Reading

Introduction to Django

Reading Time: 3 minutes In this blog, we are going to talk about Django. Before that let’s understand what is web framework and why do we need it? A web framework is a software tool that helps us develop application faster and smarter. It eliminates the need to write a lot of repetitive code and saves time. What is Django? Django is a free open source high-level web framework Continue Reading

Tuple in Python

Reading Time: 3 minutes In this blog, we are going to discuss the tuple data structure of python. A tuple is a collection which is immutable, we cannot change the elements of a tuple once it is assigned. It allows duplicate members i.e. (1, 1) is allowed. Any set mutiple comma-separated symbols written default to tuples. >>> x, y = 1, 2 Dictionaries have a method called items that Continue Reading

Data Analysis using Python: Pandas

Reading Time: 3 minutes In this blog, I am going to explain pandas which is an open source library for data manipulation, analysis, and cleaning. Pandas is a high-level data manipulation tool developed by Wes McKinney. The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data. Pandas is built on the top of NumPy. Five typical steps in the processing and analysis of Continue Reading

Currying vs Partially Applied Function

Reading Time: 2 minutes In this blog, I’m going to discuss about currying and partially applied functions. CURRYING Currying splits method with multiple parameters into a chain of functions – each with one parameter. Let’s understand currying using an example: scala> def multiply(a: Int)(b: Int)(c: Int) = a * b * c is the same as: def multiply(a: Int) = (b: Int) => (c: Int) => a * b Continue Reading

HIGHER ORDER FUNCTIONS IN SCALA

Reading Time: 2 minutes In this blog, I’m going to explain higher-order functions. A higher order function takes other function as a parameter or return a function as a result. This is possible because functions are first-class value in scala. What does that mean? It means that functions can be passed as arguments to other functions and functions can return other function. The map function is a classic example Continue Reading