Javatpoint Logo
Javatpoint Logo

Spark Components

The Spark project consists of different types of tightly integrated components. At its core, Spark is a computational engine that can schedule, distribute and monitor multiple applications.

Let's understand each Spark component in detail.

Spark Components

Spark Core

  • The Spark Core is the heart of Spark and performs the core functionality.
  • It holds the components for task scheduling, fault recovery, interacting with storage systems and memory management.

Spark SQL

  • The Spark SQL is built on the top of Spark Core. It provides support for structured data.
  • It allows to query the data via SQL (Structured Query Language) as well as the Apache Hive variant of SQL?called the HQL (Hive Query Language).
  • It supports JDBC and ODBC connections that establish a relation between Java objects and existing databases, data warehouses and business intelligence tools.
  • It also supports various sources of data like Hive tables, Parquet, and JSON.

Spark Streaming

  • Spark Streaming is a Spark component that supports scalable and fault-tolerant processing of streaming data.
  • It uses Spark Core's fast scheduling capability to perform streaming analytics.
  • It accepts data in mini-batches and performs RDD transformations on that data.
  • Its design ensures that the applications written for streaming data can be reused to analyze batches of historical data with little modification.
  • The log files generated by web servers can be considered as a real-time example of a data stream.

MLlib

  • The MLlib is a Machine Learning library that contains various machine learning algorithms.
  • These include correlations and hypothesis testing, classification and regression, clustering, and principal component analysis.
  • It is nine times faster than the disk-based implementation used by Apache Mahout.

GraphX

  • The GraphX is a library that is used to manipulate graphs and perform graph-parallel computations.
  • It facilitates to create a directed graph with arbitrary properties attached to each vertex and edge.
  • To manipulate graph, it supports various fundamental operators like subgraph, join Vertices, and aggregate Messages.

Next TopicWhat is RDD




Please Share

facebook twitter google plus pinterest

Learn Latest Tutorials


Preparation


B.Tech / MCA