Author: Sangeeta Gulia

Understanding HDFS Federation

Reading Time: 3 minutes In this blog, we will discuss about Hadoop federation, Hadoop architecture vs Hadoop Federated architecture and will talk about various issues solved by hdfs federation. So let us first see why it is gaining so much popularity. To address this question we must know the problems in the existing architecture of Hadoop which led to the creation of Hadoop federation: 1) Availability: If we have Continue Reading

Zeppelin with Spark

Reading Time: 4 minutes Let us first start with the very first question, What is Zeppelin? It is a web-based notebook that enables interactive data analytics. Based on the concept of an interpreter that can be bound to any language or data processing backend, Zeppelin is a web-based notebook server. This notebook is where you can do your data analysis. It is a Web UI REPL with pluggable interpreters Continue Reading

Deep Dive into Spark Cluster Managers

Reading Time: 5 minutes This blog aims to dig into the different Cluster Management modes in which you can run your spark application. Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program which is called the Driver Program. Specifically, to run on a cluster, the SparkContext can connect to several types of Cluster Managers, which allocate resources across Continue Reading

Play around with Microservices

Reading Time: 3 minutes What is microservice architecture Microservice architecture is a method of developing software applications as a suite of independently deployable, modular services in which each service runs a unique process and communicates through a well-defined, lightweight mechanism to serve a business goal. In other words, it is an architectural style that structures an application as a collection of loosely coupled services, which implement business capabilities. Why Continue Reading

Introducing Kafka Streams: Processing made easy

Reading Time: 2 minutes If you are working on huge amount of data, you might have heard about Kafka. At a very high level, Kafka is a fault tolerant, distributed publish-subscribe messaging system that is designed for fast processing of data and the ability to handle hundreds of thousands of messages. What is Stream Processing Stream processing is the real-time processing of data continuously, concurrently, and in a record-by-record Continue Reading

Working with Hadoop Filesystem Api

Reading Time: 2 minutes Reading data from and writing data to Hadoop Distributed File System (HDFS) can be done in a number of ways. Now let us start understanding how this can be done by using the FileSystem API, to create and write to a file in HDFS, followed by an application to read a file from HDFS and write it back to the local file system. To start Continue Reading

Jenkins – Integrating Email Service

Reading Time: 4 minutes Jenkins is one open source tool to perform continuous integration and build automation. Using it, all development work can be integrated as early as possible. The resulting artifacts are automatically created and tested and as a result the process of identification of errors becomes faster. But there must be a way so that build status and testing results can be reported to the team. And Continue Reading

Jenkins Build Jobs

Reading Time: 4 minutes In continuation to my previous blogs Introduction to Jenkins and Jenkins – Manage Security , I will now be talking about creating build jobs with Jenkins. It is easy and simple to create a new build job in Jenkins. Follow the given steps to get started: From the Jenkins Dashboard, Click on “New Item” Name your project and select project type. Click on “Ok” to Continue Reading

Jenkins – Manage Security

Reading Time: 3 minutes Jenkins is one of a powerful continuous integration tool with a great community. It is an opensource tool and hence can be easily used by anyone. So why not to start knowing a tool like this. To read about the basics and installation steps, you can refer to my previous blog Introduction to jenkins Creating Users There can be multiple users that can operate jenkins for Continue Reading

Hive Database : A basic Introduction

Reading Time: 3 minutes What is Hive? Hive is a data warehouse infrastructure tool which process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. Why to use Hive? 1) Most of the data warehousing application work with SQL based quering language, Hive supports easy portability of SQL-based application to Hadoop 2) Faster results even for tremendous datasets. Continue Reading

Introduction to Jenkins

Reading Time: 5 minutes What is Jenkins Jenkins is a powerful application that allows continuous integration and continuous delivery of projects, regardless of the platform you are working on. It is a free source that can handle any kind of build or continuous integration. You can integrate Jenkins with a number of testing and deployment technologies. How Jenkins work? Below point explains the work flow of jenkins: 1) Developers Continue Reading

Introduction to Swagger

Reading Time: < 1 minute Hello all, Knoldus organised a knolx session on the topic “Introduction to Swagger” on Friday, 28 October 2016. Swagger is a simple yet powerful representation of your RESTful API. With the largest ecosystem of API tooling on the planet, thousands of developers are supporting Swagger in almost every modern programming language and deployment environment.It specifies the format (URL, method, and representation) to describe REST web Continue Reading

Javascript Style Checker

Reading Time: 5 minutes Despite many years of experience, people still type variable names incorrectly, make syntax errors and forget to handle errors properly and forget about the best practices in hurry. But its important to write the quality code. A good linting tool, or a linter, will help to check the code errors and the standard style before someone waste their time—or worse, client’s time. First of all, Continue Reading