We all know that, Apache Spark is a fast and a general engine for large-scale data processing. It can process data up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.
But, is that the only task (i.e., MapReduce) for which Spark can be used ? The answer is: No. Spark is not only a Big Data processing engine. It is a framework which provides a distributed environment to process data. This means we can perform any type of task using Spark.
For example, lets take Factorial. We all know that Factorial Calculation for Large numbers is cumbersome in any programming language and on top of that, CPU takes a lot of time to complete the calculations. So, what can be the solution?
Well, Spark can be the solution to this problem. Lets see that in form of code.
First, we will try to implement Factorial Calculation using only Scala in a Tail Recursive way.
The time taken by above code to find the Factorial of 200000 on my machine (Quad Core Intel i5) was about 20.21s.
Now, lets implement the same function using Spark.
The time taken by Spark to find the factorial of 200000 on the same machine was only 5.41s, which is almost 4x faster than using Scala alone.
Of course, the calculation time can vary depending on the H/W we are using. But, still we have to admit that Spark not only reduced the calculation time, but also gave a much cleaner way to code it.