Analytics

Spark SQL in Delta Lake 0.7.0

Reading Time: 3 minutes Nowadays Delta lake is a buzz word in the Big Data world, especially among the spark developers because it relegates lots of issues found in the Big Data domain. Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It is evolving day by day and adds cool features in its every release. Continue Reading

Knime: Accessing a REST API with dynamic query param

Reading Time: 3 minutes Nowadays Rest API is the most widely used way to share data, In which many API returns a subset of complete data in form of page. Sometimes we need to append multiple query param in the URL to get some specific and filtered data. In this blog, we will learn how to generate dynamic URLs by adding query param and get data. Knime platform supports Continue Reading

Data Visualisation In KNIME

Reading Time: 3 minutes KNIME is definitely a dream for data scientists. It makes the work of an Data Scientist much easier. If you haven’t heard about KNIME, you can find all about it in our blog Knime Analytics Platform: A dream for a data scientist Continuing on, in this blog we will now see how to create visualizations in KNIME and how easy it is to create visualizations. Continue Reading

Knime Analytics Platform: A dream for a data scientist

Reading Time: 3 minutes In this blog, we are going to see, what is the Knime analytics platform and its important features to create an analytics workflow in an easy way. Introduction to Knime Analytics Platform KNIME is a platform built for powerful analytics on a GUI based workflow. This means you do not have to know how to code to be able to work using KNIME and derive Continue Reading

Tracking Pixels with Google Tag Manager

Reading Time: 3 minutes Introduction Hi Everyone! This is my first blog and in this, I’ll try to explain how we can use Google Tag Manager to manage and deploy tracking pixels (eg. Google Analytics) on your website without having to modify the code Tracking Pixels A tracking pixel is an HTML code snippet that is loaded when a user visits a website. It is useful for tracking user Continue Reading

MachineX: Boosting performance with XGBoost

Reading Time: 5 minutes In this blog, we are going to see how XGBoost works and some of the important features of XGBoost with the help of an example. So, many of us heard about tree models and boosting techniques. Let’s put these concepts together and talk about XGBoost, the most powerful machine learning Algorithm out there. XGboost called for eXtreme Gradient Boosted trees. The name XGBoost, though, actually Continue Reading

top 7 data analytics trends

Top 7 Data Analytics and Management Trends for 2020

Reading Time: 5 minutes We live in an era of data as it lies at the heart of digital transformation. And datasets are no longer as simple as before. They have increased in volumes, velocity, complexity and above all, are coming from multiple sources. Top tech giants like Google, Netflix, Amazon, and others are crunching massive amounts of data on a daily basis to give you a personalized experience. Continue Reading

Data Lake – Build it in Phases

Reading Time: 3 minutes Data Lake – How to build a data lake and what are the phases involved in the same.

Apache Spark: Repartitioning v/s Coalesce

Reading Time: 3 minutes Does partitioning help you increase/decrease the Job Performance? Spark splits data into partitions and computation is done in parallel for each partition. It is very important to understand how data is partitioned and when you need to manually modify the partitioning to run spark applications efficiently. Now, diving into our main topic i.e Repartitioning v/s Coalesce What is Coalesce? The coalesce method reduces the number Continue Reading

MachineX: Top 10 data Science use cases in Retail

Reading Time: 8 minutes In this blog, we will see some of the data science use cases in Retail industries and how it is transforming the customer experience. We are all aware of the troves of data, retail businesses generate on a daily basis. However, this repository of critical data is worthless if it cannot be translated into valuable insights into the consumer’s minds or market trends. While all Continue Reading

Big Data Evolution: Migrating on-premise database to Hadoop

Reading Time: 4 minutes We are now generating massive volumes of data at an accelerated rate. To meet business needs, address changing market dynamics as well as improve decision-making, sophisticated analysis of this data from disparate sources is required. The challenge is how to capture, store and model these massive pools of data effectively in relational databases. Big data is not a fad. We are just at the beginning Continue Reading

Knoldus Inc. recognized by Clutch as a Top Hadoop Consultant

Reading Time: 2 minutes Here at Knoldus Inc., we pride ourselves on being one of the best developers of scala, big and fast data, microservices, and Artificial Intelligence, all of which have become increasingly important over the past years. However large and daunting these tasks may be, our clients are always our biggest priority. This is why we are ecstatic that Clutch has chosen us as one of the Continue Reading