ML, AI and Data Engineering

Opinion: Is your Business Ready for AI on the Edge?

Reading Time: 4 minutes AI has traditionally been deployed in the cloud. AI algorithms crunch massive amounts of data and consume massive computing resources. But AI doesn’t only live in the cloud. In many situations, AI-based data crunching and decisions need to be made locally, on devices that are close to the edge of the network. At the Edge AI at the edge allows mission-critical and time-sensitive decisions to Continue Reading

AI: Making Agents Learn for Better Business Decisions

Reading Time: 3 minutes In our previous post we have talked about different types of agents that can be built for business. Any type of agent (model-based, goal-based, utility-based, etc.) can be built as a learning agent (or not). Learning allows the agent to know more than what it initially started with in terms of the operating environment. Components of a Learning Agent The learning agent can be divided Continue Reading

AI: Right Structure of Agents For your Business

Reading Time: 5 minutes In the previous post, we discussed the environment in which the agent operates and the characteristics of those environments. In this post let us talk about the types of agents and challenges of data set for the agents. All agents have the same skeletal structure. They get percepts as inputs from the sensors and the actions are performed through the actuators. Now the agent can Continue Reading

Understanding persistence in Apache Spark

Reading Time: 4 minutes In this blog, we will try to understand the concept of Persistence in Apache Spark in a very layman term with scenario-based examples. Note: The scenarios are only meant for your easy understanding. Spark Architecture Note: Cache memory can be shared between Executors. What does it mean by persisting/caching an RDD? Spark RDD persistence is an optimization technique which saves the result of RDD evaluation Continue Reading

Thinking AI? Think Data First

Reading Time: 4 minutes There is a lot of interest in Machine Learning and AI. Ofcourse, a lot of it is still the level 1 of AI . This is when we are thinking about machines acting like humans. Everyone wants to jump on the bandwagon of AI. It is an amazing field and man organizations do not want to be left behind. That said, something which is ignored most of the time is the fuel, the data!

Flink: Time Windows based on Processing Time

Reading Time: 4 minutes In the previous blog, we talked about Flink’s windows operator, a heart of processing infinite streams. Generally in Flink, after specifying that the stream is keyed or non keyed, the next step is to define a window assigner. The window assigner defines how elements are assigned to windows. Flink provides some useful predefined window assigners like Tumbling windows, Sliding windows, Session windows, Count windows, and Continue Reading

Creating Data Pipeline with Spark streaming, Kafka and Cassandra

Reading Time: 3 minutes Hi Folks!! In this blog, we are going to learn how we can integrate Spark Structured Streaming with Kafka and Cassandra to build a simple data pipeline. Spark Structured Streaming is a component of Apache Spark framework that enables scalable, high throughput, fault tolerant processing of data streams.Apache Kafka is a scalable, high performance, low latency platform that allows reading and writing streams of data Continue Reading

Spark Structured Streaming (Part 4) – Handling Late Data

Reading Time: 3 minutes Welcome back folks to this blog series of Spark Structured Streaming. This blog is the continuation of the earlier blog “Understanding Stateful Streaming“. And this blog pertains to Handling Late Arriving Data in Spark Structured Streaming. So let’s get started. Handling Late Data With window aggregates (discussed in the previous blog) Spark automatically takes cares of late data. Every aggregate window is like a bucket Continue Reading

Spark: Streaming Datasets

Reading Time: 3 minutes Spark providing us a high-level API – Dataset, which makes it easy to get type safety and securely perform manipulation in a distributed and a local environment without code changes. Also, spark structured streaming, a high-level API for stream processing allows us to stream a particular Dataset which is nothing but a type-safe structured streams. In this blog, we will see how we can create Continue Reading

Spark Structured Streaming (Part 3) – Stateful Streaming

Reading Time: 4 minutes Welcome back folks to this blog series of Spark Structured Streaming. This blog is the continuation of the earlier blog “Internals of Structured Streaming“. And this blog pertains to Stateful Streaming in Spark Structured Streaming. So let’s get started. Let’s start from the very basic understanding of what is Stateful Stream Processing. But to understand that, let’s first understand what Stateless Stream Processing is. In Continue Reading

Analysis of campus placement dataset using decision tree

Reading Time: 3 minutes KNIME Analytics Platform is open-source software for creating data science applications and services. Intuitive, open, and continuously integrating new developments, KNIME makes understanding data and designing data science workflows and reusable components accessible to everyone. With KNIME Analytics Platform, you can create visual workflows with an intuitive, drag and drop style graphical interface, without the need for coding. Hello, folks! In this blog, we will analyse the Campus placement data Continue Reading

Stateful Streaming in Spark

Reading Time: 4 minutes Apache Spark is a fast and general-purpose cluster computing system. In Spark, we can do the batch processing and stream processing as well. It does near real-time processing. It means that it processes the data in micro-batches. I have discussed more Spark Streaming in my previous blog. Now in this blog, I’ll discuss Stateful Streaming in Spark. So let’s start !! What is Stateful Streaming? Continue Reading

Spark Structured Streaming (Part 2) – The Internals

Reading Time: 2 minutes Welcome back folks to this blog series of Spark Structured Streaming. This blog is the continuation of the earlier blog “Introduction to Structured Streaming“. So I’ll exactly start from the point where I left in the previous blog. Structure of Streaming Query When we call start() API, Spark internally translates this code into a Logical Plan (an abstract representation of what the code does), then Continue Reading