Javatpoint Logo
Javatpoint Logo

Spark Components

The Spark project consists of different types of tightly integrated components. At its core, Spark is a computational engine that can schedule, distribute and monitor multiple applications.

Let's understand each Spark component in detail.

Spark Components

Spark Core

  • The Spark Core is the heart of Spark and performs the core functionality.
  • It holds the components for task scheduling, fault recovery, interacting with storage systems and memory management.

Spark SQL

  • The Spark SQL is built on the top of Spark Core. It provides support for structured data.
  • It allows to query the data via SQL (Structured Query Language) as well as the Apache Hive variant of SQL?called the HQL (Hive Query Language).
  • It supports JDBC and ODBC connections that establish a relation between Java objects and existing databases, data warehouses and business intelligence tools.
  • It also supports various sources of data like Hive tables, Parquet, and JSON.

Spark Streaming

  • Spark Streaming is a Spark component that supports scalable and fault-tolerant processing of streaming data.
  • It uses Spark Core's fast scheduling capability to perform streaming analytics.
  • It accepts data in mini-batches and performs RDD transformations on that data.
  • Its design ensures that the applications written for streaming data can be reused to analyze batches of historical data with little modification.
  • The log files generated by web servers can be considered as a real-time example of a data stream.

MLlib

  • The MLlib is a Machine Learning library that contains various machine learning algorithms.
  • These include correlations and hypothesis testing, classification and regression, clustering, and principal component analysis.
  • It is nine times faster than the disk-based implementation used by Apache Mahout.

GraphX

  • The GraphX is a library that is used to manipulate graphs and perform graph-parallel computations.
  • It facilitates to create a directed graph with arbitrary properties attached to each vertex and edge.
  • To manipulate graph, it supports various fundamental operators like subgraph, join Vertices, and aggregate Messages.

Next TopicWhat is RDD




Please Share

facebook twitter google plus pinterest

Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA