Analytics

Amazon EMR

Reading Time: 3 minutes Businesses worldwide are discovering the power of new big data processing and analytics frameworks like Apache Hadoop and Apache Spark, but they are also discovering some of the challenges of operating these technologies in on-premises data lake environments. They may also have concerns about the future of their current distribution vendor. Common problems of on-premises big data environments include a lack of agility, excessive costs, Continue Reading

Apache Spark: Tricks to Increase Job Performance

Reading Time: 2 minutes Apache Spark is quickly adopting the Real-world and most of the companies like Uber are using it in their production. Spark is gaining its popularity in the market as it also provides you with the feature of developing Streaming Applications and doing Machine Learning, which helps companies get better results in their production along with proper analysis using Spark. Although companies are using Spark in Continue Reading

MachineX: Demystifying Market Basket analysis

Reading Time: 7 minutes In this blog, we are going to see how we can Anticipate customer behavior with Market Basket analysis By using Association rules. Introduction to Market Basket analysis Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it Continue Reading

Time Travel: Data versioning in Delta Lake

Reading Time: 3 minutes In today’s Big Data world, we process large amounts of data continuously and store the resulting data into data lake. This keeps changing the state of the data lake. But, sometimes we would like to access a historical version of our data. This requires versioning of data. Such kinds of data management simplifies our data pipeline by making it easy for professionals or organizations to Continue Reading

Data Lake – Build it in Phases

Reading Time: 3 minutes Data Lake – How to build a data lake and what are the phases involved in the same.

Apache Spark: Read Data from S3 Bucket

Reading Time: < 1 minute Amazon S3 Accessing S3 Bucket through Spark Edit spark-default.conf file You need to add below 3 lines consists of your S3 access key, secret key & file system

MachineX: Diabetic retinopathy detection using AI

Reading Time: 3 minutes In this blog we are going to discuss about diabetic retinopathy and how can we prevent it by using Artificial intelligence. Diabetic retinopathy is a diabetes complication that affects eyes. Damage to the blood vessels of the light-sensitive tissue of the retina causes this complication. Diabetic retinopathy (DR) is a leading cause of vision-loss globally. Approximately one-third of 285 million people with diabetes mellitus worldwide Continue Reading

MachineX: Top 10 data Science use cases in Retail

Reading Time: 8 minutes In this blog, we will see some of the data science use cases in Retail industries and how it is transforming the customer experience. We are all aware of the troves of data, retail businesses generate on a daily basis. However, this repository of critical data is worthless if it cannot be translated into valuable insights into the consumer’s minds or market trends. While all Continue Reading