ML, AI and Data Engineering

Tale of Apache Spark

Reading Time: 6 minutes Data is being produced extensively in today’s world and it is going to be generated more rapidly in future. 90% of total data that is produced in the world is produced in last two years only and it is estimated that in 2020 world’s total data would reach 45 ZB and data generated each day would be enough that if we try to store it Continue Reading

MachineX: AI in Manufacturing controlling Quality

Reading Time: 3 minutes Manufacturing is a very simple business. Here the owner buys the raw material or parts to manufacture a finished product. However, it’s a risky business in terms of selling the finished products. Supply too much and you flood the market, causing a drop in price and a drop in profits. By not meeting demand, the customer may go elsewhere with a drop in sales for Continue Reading

MachineX: Starts With Why ft. Convolutional Neural Network

Reading Time: 4 minutes If you are looking for a short answer, I would say real life image dataset are not small like MNIST to build a model with a fully connected neural network. But let’s produce some dopamine and explore the convolutional neural network a bit more in-depth. Our Visual Cortex focuses on certain areas to identify any image and similarly Convolutional Neural Network also focuses on the Continue Reading

Defining your workflow: Why Not Airflow?

Reading Time: 4 minutes What is Apache Airflow? Airflow is a platform to programmatically author, schedule & monitor workflows or data pipelines. These functions achieved with Directed Acyclic Graphs (DAG) of the tasks. It is an open-source and still in the incubator stage. It was initialized in 2014 under the umbrella of Airbnb since then it got an excellent reputation with approximately 800 contributors on GitHub and 13000 stars. Continue Reading

Machine X: Text Summarization in Python

Reading Time: 5 minutes In this blog, we will learn about the different type of text summarization methods and at the end, we will see a practical of the same. We all interact with applications that use text summarization. several of these applications are for the platform that publishes articles on daily news, amusement, sports. With our busy schedule, we have a tendency to choose to read the summary of this article before we decide to jump in for reading the whole article. Reading a summary help us to spot the Continue Reading

Using Vertica with Spark-Kafka: Write using Structured Streaming

Reading Time: 3 minutes In two previous blogs, we explored about Vertica and how it can be connected to Apache Spark. The first blog in this mini series was about reading data from Vertica using Spark and saving that data into Kafka. The next blog explained the reverse flow i.e. reading data from Kafka and writing data to Vertica but in a batch mode. i.e reading data from Kafka Continue Reading

Using Vertica with Spark-Kafka: Writing

Reading Time: 4 minutes In previous blog of this series, we took a glance over the basic definition of Spark and Vertica. We also did a code overview for reading data from Vertica using Spark as DataFrame and saving the data into Kafka. In this blog we will be doing the reverse flow i.e. working on reading the data from Kafka as a DataFrame and writing that DataFrame into Continue Reading

MachineX: Genetic Algorithm

Reading Time: 2 minutes Genetic algorithm is based on the Charles Darwin famous principle of survival of the fittest, where the fittest of the individuals are given higher importance and are chosen for reproduction in order to produce children for the new generation. The process starts by selecting the fittest individuals from a population, who then produce offspring which inherit the characteristics of the parents. Since the parents already Continue Reading

MachineX: Evaluation Metrics for a Regression ML Model

Reading Time: 3 minutes In this blog post, we will quickly look at the various metrics to evaluate our regression models. But first, let us briefly discuss one of the best-known model evaluation approach we use which is Train-Test or also known as Train-Validation split. Train-Test Split: In this approach, we split the data into two parts known as Training set and Test set. The model is then trained Continue Reading

TensorFlow for deep learning Part 1

Reading Time: 3 minutes TensorFlow is a free and Open-Source Software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library and is also used for machine learning applications such as neural networks. It is used for both research and production at Google. TensorFlow was developed by the Google Brain team for internal Google use. Deep learning is a particular kind of Continue Reading

Do you really need Spark? Think Again!

Reading Time: 5 minutes With the massive amount of increase in big data technologies today, it is becoming very important to use the right tool for every process. The process can be anything like Data ingestion, Data processing, Data retrieval, Data Storage, etc. Today we are going to focus on one of those popular big data technologies i.e., Apache Spark. Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark Continue Reading

MachineX: SVM as Non-Linear Classifiers

Reading Time: 3 minutes In our previous blogs, we have already looked and had a higher level understanding of SVM and why to choose SVM over other classifiers. In this blog post, we will look at a detailed explanation of how to use SVM for complex decision boundaries and build Non-Linear Classifiers using SVM. The primary method for doing this is by using Kernels. In linear SVM we find Continue Reading

Protein Structure determination aided by Stochastic Search (Replica Exchange Monte-Carlo Method)

Reading Time: 8 minutes Introduction Proteins are large molecules, which occur in abundance in every single living organism. They carry out vital functions such as transporting oxygen, converting the food you eat into energy your body can use, and many more. Proteins are long chains of linked units called amino acids. There are 20 types of amino acids. Proteins fold into different shapes depending upon their sequence of amino Continue Reading

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!