Reading Time: 2 minutes Fast Data is empowering organizations to respond in real-time. About 75% of organizations are already using it for at least some of their applications.
Reading Time: 2 minutes Apache Spark is quickly adopting the Real-world and most of the companies like Uber are using it in their production. Spark is gaining its popularity in the market as it also provides you with the feature of developing Streaming Applications and doing Machine Learning, which helps companies get better results in their production along with proper analysis using Spark. Although companies are using Spark in Continue Reading
Reading Time: 3 minutes Does partitioning help you increase/decrease the Job Performance? Spark splits data into partitions and computation is done in parallel for each partition. It is very important to understand how data is partitioned and when you need to manually modify the partitioning to run spark applications efficiently. Now, diving into our main topic i.e Repartitioning v/s Coalesce What is Coalesce? The coalesce method reduces the number Continue Reading
Reading Time: 6 minutes Data is evolving both in terms of quality and quantity in today’s enterprises and in the past few years, changes have occurred at a much faster pace. Not long ago, Big Data was considered the next big thing for digital transformation. Technologies like Hadoop and HBase made sense as batch processing of data was the norm. But things are not the same now. By the Continue Reading