ML, AI and Data Engineering

Data Lake – Build it in Phases

Reading Time: 3 minutes Data Lake – How to build a data lake and what are the phases involved in the same.

Apache Spark: Read Data from S3 Bucket

Reading Time: 2 minutes Well, a one working with spark is very much familiar with the ways of reading the file from local either from a Table or HDFS or from any file. But do you know how tricky it is to read data into spark from an S3 bucket? So, this blog makes you give a stepwise follow up to how to read data from an S3 bucket. Continue Reading

Apache Spark: Repartitioning v/s Coalesce

Reading Time: 3 minutes Does partitioning help you increase/decrease the Job Performance? Spark splits data into partitions and computation is done in parallel for each partition. It is very important to understand how data is partitioned and when you need to manually modify the partitioning to run spark applications efficiently. Now, diving into our main topic i.e Repartitioning v/s Coalesce What is Coalesce? The coalesce method reduces the number Continue Reading

Understanding the working of Spark Driver and Executor

Reading Time: 4 minutes This blog pertains to Apache SPARK, where we will understand how Spark’s Driver and Executors communicate with each other to process a given job. So let’s get started. First, let’s see what Apache Spark is. The official definition of Apache Spark says that “Apache Spark™ is a unified analytics engine for large-scale data processing.” It is an in-memory computation processing engine where the data is Continue Reading

Understanding how Spark runs on YARN with HDFS

Reading Time: 6 minutes This blog pertains to Apache SPARK and YARN (Yet Another Resource Negotiator), where we will understand how Spark runs on YARN with HDFS. So let’s get started. First, let’s see what Apache Spark is. The official definition of Apache Spark says that “Apache Spark™ is a unified analytics engine for large-scale data processing.” It is an in-memory computation processing engine where the data is kept Continue Reading

Yes, you can do it without AI

Reading Time: 4 minutes AI in the modern day has become a buzz word. No company, be it in technology space or not, wants to be left behind and be in a state of FOMO. Interestingly from a buzz word status, Mariya suggested AI is like teenage sex: everyone talks about it, nobody knows how to do it, everyone thinks everyone else is doing it & so claims to Continue Reading

MachineX: Heart Diseases detection using Machine Learning

Reading Time: 4 minutes In this blog, we will be going to see how we can use machine learning and data science to detect or to predict potential Heart Diseases. Introduction Heart disease describes a range of conditions that affect your heart. Diseases under the heart disease umbrella include blood vessel diseases, such as coronary artery disease, heart rhythm problems (arrhythmias) and heart defects you’re born with (congenital heart Continue Reading

Getting started with Amazon SNS

Reading Time: 2 minutes Introduction The Simple Notification Service (SNS) is used as a publish and subscribe messaging service. But what does it mean? SNS is centered around topics and you can think of a topic as a group for collecting messages. Users or endpoints can then subscribe to this topic and messages or events are then published to that topic. When a message is published, all subscribers to Continue Reading

MachineX: Evaluation Metrics for Classification Models

Reading Time: 5 minutes In our last blog post, we have looked at various evaluation metrics for the regression model. Continuing on this we will take a look at the evaluation metrics used for classification models. Classification is about predicting class labels given input data. In binary classification, there are two possible output classes whereas in Multi-class classification we have more than two possible output classes. We are going Continue Reading

MachineX: Diabetic retinopathy detection using AI

Reading Time: 3 minutes In this blog we are going to discuss about diabetic retinopathy and how can we prevent it by using Artificial intelligence. Diabetic retinopathy is a diabetes complication that affects eyes. Damage to the blood vessels of the light-sensitive tissue of the retina causes this complication. Diabetic retinopathy (DR) is a leading cause of vision-loss globally. Approximately one-third of 285 million people with diabetes mellitus worldwide Continue Reading

Boosting medical diagnosis with Klickare

Reading Time: 4 minutes In this blog, we are going to see how KlicKare can boost up medical diagnosis by using deep learning. Medical diagnostics are a category of medical tests designed to detect infections, conditions, and diseases. These medical diagnostics fall under the category of in-vitro medical diagnostics (IVD) which be purchased by consumers or used in laboratory settings. Biological samples are isolated from the human body such Continue Reading

MachineX: Top 10 data Science use cases in Retail

Reading Time: 8 minutes In this blog, we will see some of the data science use cases in Retail industries and how it is transforming the customer experience. We are all aware of the troves of data, retail businesses generate on a daily basis. However, this repository of critical data is worthless if it cannot be translated into valuable insights into the consumer’s minds or market trends. While all Continue Reading

Getting started with Amazon SQS

Reading Time: 4 minutes With the continuing growth of microservices and a cloud best practice of designing decoupled systems, it’s important that developers have the ability to utilize a service or system that handles the delivery of messages between components and this is where SQS comes in. Amazon SQS (Simple Queue Service) is a fully managed service offered by AWS, that works seamlessly with server-less systems, microservices or any Continue Reading