Author: Ramandeep

Do you really need Spark? Think Again!

Reading Time: 5 minutes With the massive amount of increase in big data technologies today, it is becoming very important to use the right tool for every process. The process can be anything like Data ingestion, Data processing, Data retrieval, Data Storage, etc. Today we are going to focus on one of those popular big data technologies i.e., Apache Spark. Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark Continue Reading

Knolx: How Spark does it internally?

Reading Time: 1 minute Knoldus has organized a 30 min session on Oct 12 at 3:30 PM. The topic was How Spark does it internally? Many people have joined and enjoyed the session. I am going to share the slides and the video here. Please let me know if you have any question related to linked slides.   How Spark Does It Internally? from Knoldus Inc.   Here’s the video of the Continue Reading

kafka with spark

RDD: Spark’s Fault Tolerant In-Memory weapon

Reading Time: 5 minutes A fault-tolerant collection of elements that can be operated on in parallel:  “Resilient Distributed Dataset” a.k.a. RDD RDD (Resilient Distributed Dataset) is the fundamental data structure of Apache Spark which are an immutable collection of objects which computes on the different node of the cluster. Each and every dataset in Spark RDD is logically partitioned across many servers so that they can be computed on Continue Reading

kafka with spark

Spark Unconstructed | Deep dive into DAG

Reading Time: 4 minutes Apache Spark is all the rage these days. People who work with Big Data, Spark is a household name for them. We have been using it for quite some time now. So we already know that Spark is lightning-fast cluster computing technology, it is faster than Hadoop MapReduce. If you ask any of these Spark techies, how Spark is fast, they would give you a Continue Reading

The Law of Demeter

Reading Time: 3 minutes You’ll often get to hear from good programmers about having “loosely coupled” classes. What do they mean by saying that? Let’s understand this first before jumping onto the Law of Demeter. Loosely Coupled  In object-oriented design, the amount of coupling refers to how much the design of one class depends on the design of another class. In other words, how often do changes in class

Clean Code – Robert C. Martin’s Way

Reading Time: 7 minutes Writing good code in accordance with all the best practices is often overrated. But is it really? Writing good and clean code is just like good habits which will come with time and practice. We always give excuses to continue with our patent non-efficient bad code. Reasons like no time for best practices, meeting the deadlines, angry boss, tired of the project etc. Most of Continue Reading

Docker Architecture

Reading Time: 3 minutes In my previous blog, we had a little glimpse of what Docker is. It’s time to take one step ahead of that. Let’s understand more about Docker through its architecture. A Quick recall WHAT? Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Containers allow a developer to package up an application with all of the Continue Reading

What is Docker?

Reading Time: 3 minutes Docker is not a new term to almost all of us. It is stealing the thunder everywhere. But what exactly Docker is? In simple words, Docker is a software containerization platform, meaning you can build your application, package them along with their dependencies into a container and then these containers can be easily shipped to run on other machines. Okay, but what is Containerization anyway? Continue Reading

Blaze your App with Gatling

Reading Time: 4 minutes Every time we write code, we think if it works fine on local, it would work fine on production too. We do check on production as well but for limited users or non-real scenarios. A system may run very well with only 1,000 concurrent users, but how would it run with 100,000? It may or may not respond on time. Are we doing anything to deal Continue Reading

Jenkins next step: Pipelining

Reading Time: 6 minutes In my previous blog, I briefly gave an introduction to continuous integration, what Jenkins is and we made our first build using Jenkins. Now, in this blog, we will see how can we build a Pipeline in Jenkins. Pipeline – what and why? When the number of plugins executed on SCM or code, then it is known as Pipelining. With the help of Pipeline plugin, users Continue Reading

Jenkins for Continuous Integration

Reading Time: 4 minutes Jenkins is not a new term to almost all of us. It’s a continuous integration/continuous deployment server. Before starting off with Jenkins, let’s first understand what Continuous Integration is. Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an automated build, allowing teams to detect problems early.

Validating XML using XSD

Reading Time: 2 minutes Lately, I have been working on a use case where I was asked to parse XML for its validation and retrieve its values. There are two different document type definitions that can be used with XML: DTD – The original Document Type Definition XML Schema – An XML-based alternative to DTD A document type definition defines the rules and the legal elements and attributes for Continue Reading

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!