What makes spark so powerful (Part 1)

Reading Time: 3 minutes

In this blog, we are going to see some core components of spark which make it so powerful and easy to use.

This series will have 3 different parts which will cover following topics:

  • What makes spark so powerful (Part 1) – Storage layer
    • Birth of spark
    • Storage
  • What makes spark so powerful (Part 2) – Resource management
    • Cluster management
    • Worker
  • What makes spark so powerful (Part 3) – Engine & Ecosystem and APIs
    • Spark SQL
    • MLlib
    • Structure streaming

Birth of Spark

We all must have heard about the popularity of Spark, which is adopted by major players like Amazon, eBay, and Yahoo!.
But have you ever think? why we need a spark and why it actually initiated?

OK let me tell you!!!!!
Spark was initiated to address the potential issues in Hadoop Map Reduce framework. Although Hadoop Map Reduce was a groundbreaking framework to handle big data processing, in reality, it still had a lot of limitations in terms of speed. Spark was new and capable of doing in-memory computations, which made it almost 100 times faster than any other big data processing framework. Since then, there has been a continuous increase in the adoption of Spark across the globe for big data applications.

Apache Spark vs Hadoop Map Reduce

  • Apache Spark is really fast, nearly 100 times faster than Hadoop Map Reduce.
  • Suppose you want real-time decisions or business insights, then you should opt for Spark and its in-memory processing.
  • Spark has many inbuilt libraries, like for machine learning it has spark ml but Hadoop needs a third party to provide these functionalities.
  • As the spark is fast in speed, it can create all combinations faster.

Spark Architecture

There are five core components that make Spark so powerful and easy
to use. The core architecture of Spark consists of the following layers,

  • Storage
  • Resource Management
  • Engine
  • Ecosystem
  • APIs

In this part of the blog, we are only going to talk about the Storage layer, and in the next blogs, we will talk about every individual layer of spark.

Storage

Prior to utilizing Spark, data must be made available to process it. This data can be there in any sort of database. Spark offers various choices to utilize various types of data sources, in terms of processing data on a large scale. Spark allows you to utilize many State of the arts and traditional relational databases just as NoSQL, for example, Cassandra and MongoDB.

Additionally, It provides the ability to read from almost every popular file systems such as HDFS, Cassandra, Hive, HBase, SQL servers.

This is all from this blog. Hope you enjoyed the blog and it helped you!! Stay connected for more future blogs. Thank you!!

Stay Tunes, happy learning 🙂

Follow MachineX Intelligence for more:

Knoldus-blog-footer-image

Written by 

Shubham Goyal is a Data Scientist at Knoldus Inc. With this, he is an artificial intelligence researcher, interested in doing research on different domain problems and a regular contributor to society through blogs and webinars in machine learning and artificial intelligence. He had also written a few research papers on machine learning. Moreover, a conference speaker and an official author at Towards Data Science.