ML, AI and Data Engineering

Amazon EMR

Reading Time: 3 minutes Businesses worldwide are discovering the power of new big data processing and analytics frameworks like Apache Hadoop and Apache Spark, but they are also discovering some of the challenges of operating these technologies in on-premises data lake environments. They may also have concerns about the future of their current distribution vendor. Common problems of on-premises big data environments include a lack of agility, excessive costs, Continue Reading

TensorFlow Quantum: beauty and the beast

Reading Time: 4 minutes So, we are finally here, after a long wait, we are going to be in an era of quantum computing. TFQ, the beauty of TensorFlow and beast nature of quantum computing. Quantum computing is becoming a technology to observe more closely in 2020. We have seen some recent announcements from Honeywell, Google and others, it’s worth looking forward to new pieces of hardware coming this year. Now, Google has Continue Reading

Modernizing Data Storage for fuelling Digital Transformation

Reading Time: 5 minutes As companies mature in their digital transformation journey, old technologies and rules of doing business are being re-defined. Capturing customers is no longer enough and companies are focusing on how to keep them engaged with hyper-personalized experiences. There’s an explosion of data sources as everyone and everything is connected with mobile devices, social media, and IoT.  What this means for a business is an exponential Continue Reading

Apache Spark: Tricks to Increase Job Performance

Reading Time: 2 minutes Apache Spark is quickly adopting the Real-world and most of the companies like Uber are using it in their production. Spark is gaining its popularity in the market as it also provides you with the feature of developing Streaming Applications and doing Machine Learning, which helps companies get better results in their production along with proper analysis using Spark. Although companies are using Spark in Continue Reading

MachineX: Anticipate Customer behavior for Retailing

Reading Time: 4 minutes In this blog, we are going to see the power of Customer behavior Anticipation and how it can derive the success of the retail sector. Nowadays, Machine learning is playing an important in the success of different sectors. we can talk about Healthcare, Finance, Manufacturing, Agriculture, now even in Education. Retail is one of the sectors, which is getting huge benefits from machine learning and Continue Reading

MachineX: Demystifying Market Basket analysis

Reading Time: 7 minutes In this blog, we are going to see how we can Anticipate customer behavior with Market Basket analysis By using Association rules. Introduction to Market Basket analysis Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it Continue Reading

Spark: ACID Transaction with Delta Lake

Reading Time: 3 minutes Spark doesn’t provide some of the most essential features of a reliable data processing system such as Atomic APIs and ACID transactions as discussed in the blog Spark: ACID compliant or not. Spark welcomes a solution to the problem by working with Delta Lake. Delta Lake plays an intermediary service between Apache Spark and the storage system. Instead of directly interacting with the storage layer, Continue Reading

Time Travel: Data versioning in Delta Lake

Reading Time: 3 minutes In today’s Big Data world, we process large amounts of data continuously and store the resulting data into data lake. This keeps changing the state of the data lake. But, sometimes we would like to access a historical version of our data. This requires versioning of data. Such kinds of data management simplifies our data pipeline by making it easy for professionals or organizations to Continue Reading

MachineX: The Power of Recommendation Engines

Reading Time: 4 minutes In this blog, we are going to talk about, what actually Recommendation Engines is and different types of same. You can see the full webinar, related to this blog here : Recommender Engines or Systems is one of the most mainstream utilization of data science today. They are utilized to predict the “rating” or “preference” that a user would provide for a thing. Pretty much Continue Reading

Data Lake – Build it in Phases

Reading Time: 3 minutes Data Lake – How to build a data lake and what are the phases involved in the same.

Apache Spark: Read Data from S3 Bucket

Reading Time: < 1 minute Amazon S3 Accessing S3 Bucket through Spark Edit spark-default.conf file You need to add below 3 lines consists of your S3 access key, secret key & file system