Apache Spark

Apache Spark is a unified analytics engine for processing large volumes of data.
It can run workloads 100 times faster than Hadoop MapReduce
It offers over 80 high-level operators that make it easy to build parallel apps.
Spark can run on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud, and can access data from multiple sources.
Spark processes data in batches as well as in real-time
Spark stores data in the RAM i.e. in-memory. So, it is easier to retrieve it
Spark provides caching and in-memory data storage

Components of the Spark ecosystem

Apache Spark has 3 main categories that comprise its ecosystem. Those are:

Language support

Spark can integrate with different languages to applications and perform analytics. These languages are Java, Python, Scala, and R.

Core Components

Spark supports 5 main core components. There are Spark Core, Spark SQL, Spark Streaming, Spark MLlib, and GraphX.

Cluster Management

Spark can be run in 3 environments. Those are the Standalone cluster, Apache Mesos, and YARN.