data analysis

Explore OpenCV & Why Do We Need To Know About It?

Reading Time: 4 minutes OpenCV (Open Source Computer Vision Library) is an open-source computer vision and machine learning software library. It was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in commercial products. Being a BSD-licensed product, OpenCV makes it easy for businesses to utilize and modify the code. OpenCV OpenCV is the huge open-source library for computer vision, Continue Reading

BigQuery: Querying nested arrays

Reading Time: 2 minutes In a previous blog, we had seen BigQuery facilitate efficient data warehouse schema design. BigQuery supports the nested & repeated columns. We can use a combination of ARRAY and STRUCT data types to define our schema in BigQuery. It enables to denormalize data efficiently in single table. In this blog, for the same schema of sales data, we will execute a few DML operations on nested array fields. Schema In Continue Reading

BigQuery: Rescue to the Conventional Data warehouse Problems

Reading Time: 4 minutes The present and future of every industry sector somehow depends on the ability to use the massive amounts of data. Use the data available to drive better product quality at a lower cost. Make favourable business decisions with data. Primarily, for decades, to store a wide variety of massive data and perform analysis on it, using Data Warehouse solutions. Traditional data warehouses designed on-premise specifically Continue Reading

Use Plotly Library for Visualization

Reading Time: 4 minutes Plotly is a very important and beautiful Library of data science. It is an open-source library. It also supports both framework python and Django It has so many types of graphs like scatter, bar, pie bubble, dot treemap, etc. What is Plotly ? Plotly library which is an open-source and without charge kind of library. The utilize of Plotly for statistical analysis of data enables Continue Reading

Apache Beam: Side input Pattern

Reading Time: 3 minutes Apache Beam is a unified programming model for defining both batch and streaming data-parallel processing pipelines. It is a modern way of defining data processing pipelines. It has rich sources of APIs and mechanisms to solve complex use cases. In some use cases, while we define our data pipelines the requirement is, the pipeline should use some additional inputs. For example, In streaming analytics applications, it Continue Reading

Data Analysis Using Python

Reading Time: 4 minutes In this blog we will introduce an overview of Python packages used for data analysis. And finally, we will  learn about how to import and export data in and from Python, and how to obtain basic insights from the datasets. for understanding the basic concepts of Data Analytics , you can go through this link. Python packages for Data Analysis: In order to do analysis Continue Reading

How To Find Correlation Value Of Categorical Variables.

Reading Time: 4 minutes Hey folks, In this blog we are going to find out the correlation of categorical variables. What is Categorical Variable? In statistics, a categorical variable has two or more categories.But there is no intrinsic ordering to the categories. For example, a binary variable(such as yes/no question) is a categorical variable having two categories (yes or no), and there is no intrinsic ordering to the categories. Continue Reading

A Quick Demo: Kafka to Flink to Cassandra

Reading Time: 3 minutes Hi Folks!! In this blog, we are going to learn how we can integrate Flink with Kafka and Cassandra to build a simple streaming data pipeline. Apache Flink is a framework and distributed processing engine. it is used for stateful computations over unbounded and bounded data streams.Kafka is a scalable, high performance, low latency platform. It allows reading and writing streams of data like a messaging system.Cassandra: A distributed and wide-column Continue Reading

Creating Data Pipeline with Spark streaming, Kafka and Cassandra

Reading Time: 3 minutes Hi Folks!! In this blog, we are going to learn how we can integrate Spark Structured Streaming with Kafka and Cassandra to build a simple data pipeline. Spark Structured Streaming is a component of Apache Spark framework that enables scalable, high throughput, fault tolerant processing of data streams.Apache Kafka is a scalable, high performance, low latency platform that allows reading and writing streams of data Continue Reading

Analysis of campus placement dataset using decision tree

Reading Time: 3 minutes KNIME Analytics Platform is open-source software for creating data science applications and services. Intuitive, open, and continuously integrating new developments, KNIME makes understanding data and designing data science workflows and reusable components accessible to everyone. With KNIME Analytics Platform, you can create visual workflows with an intuitive, drag and drop style graphical interface, without the need for coding. Hello, folks! In this blog, we will analyse the Campus placement data Continue Reading

ICC Test Cricket Data Analysis using KNIME

Reading Time: 4 minutes KNIME Analytics Platform is open-source software for creating data science applications and services. Intuitive, open, and continuously integrating new developments, KNIME makes understanding data and designing data science workflows and reusable components accessible to everyone. With KNIME Analytics Platform, you can create visual workflows with an intuitive, drag and drop style graphical interface, without the need for coding. Hello, folks! In this blog, we will analyse Continue Reading

Knime Analytics Platform: A dream for a data scientist

Reading Time: 3 minutes In this blog, we are going to see, what is the Knime analytics platform and its important features to create an analytics workflow in an easy way. Introduction to Knime Analytics Platform KNIME is a platform built for powerful analytics on a GUI based workflow. This means you do not have to know how to code to be able to work using KNIME and derive Continue Reading

Apache Spark: Delta Lake as a Solution – Part II

Reading Time: 3 minutes Well, we have already covered the missing features in Apache Spark & also the causes of the issue in executing Delta Lake in Part1. However, today we will be talking about What Delta Lake is & how it provides the solution to all those problems discussed herein Delta Lake as a Solution: Part1.As we all know that Spark is just a processing engine, it doesn’t Continue Reading