Elasticsearch Architecture

Elasticsearch is a distributed search engine used for full-text search. In this section, we are going to discuss the physical architecture of Elasticsearch. In which we will see how documents are distributed across the physical or virtual machine. Along with it, we will also see how machines work together to form a cluster.

In Elasticsearch architecture, node and cluster play an important role. These are the center of Elasticsearch architecture. Each node in a cluster handles the HTTP request for a client who wants to send the request to the cluster.

Node and Cluster

Before begin, we need to know about the nodes and clusters to understand the architecture of Elasticsearch, as these are the center of Elasticsearch architecture. These are the essential part of elasticsearch. By default, each node in a cluster can handle transport traffic and HTTP requests. Node and cluster are discussed below in detail:

Node

A node is a server and a part of the cluster that stores the data. It can be either virtual or physical. A node refers to an instance of Elasticsearch, not a machine. Therefore, any number of nodes can run on the same machine. Whenever an elasticsearch instance starts, a node starts running.

Cluster

An Elasticsearch cluster is a group of Elasticsearch nodes, which are connected to each other and together stores all of your data. Each node contains a part of the cluster's data that you add to the cluster. You can use any number of clusters, but one node is usually sufficient. A cluster is automatically created when a node starts up.

Each and every node be a part of the cluster. It participates in searching and indexing of clusters, which means that a node participates in search query by searching the data stored by it. A node stores the data, which is searched by the search query. Let's understand with the help of an example -

You might have two nodes - Node A and Node B. Both nodes have some data, and that data is a match of the given search query. Here, we need to understand that a node contains the part of your data, which is searched by a search query. The node supports the following operations, such as - indexing and searching for data or manipulating existing data.

Along with this, it is also essential to know that each node within a cluster can handle HTTP requests for the clients who want to send a request to the cluster.
This can be achieved using the HTTP Rest API that a cluster exposes.
A given node receives that request, which is sent by the client and manages the rest of the task.
It can also forward the requests using the transport layer to a given node. On the other hand, the HTTP layer is used to communicate with external clients.
By default, all the nodes accept the HTTP request from the clients.
In addition, a given node within a cluster knows about each node present in the cluster.
Scalability requires more than one node, it works efficiently with huge data.
By default, each node may also assign as Master Node. A master node is a node with additional features. It coordinates all the changes that occur in the cluster, such as - add or remove indexes, add or remove nodes as well as it can also update the states of the cluster.
The master node has the ability to update the states of the cluster. Here, one important thing needs to be noted that only a master node can do this.
Each cluster and nodes have a unique name, which helps to identify them. The "elasticsearch" is the default name of the cluster, and "UUID (Universally Unique Identifier)" is the default name of node.
These unique names help to identify that which virtual or physical machine corresponds to which nodes.

Elasticsearch stores your data in document form. Look at the below example of the data store in elasticsearch.

For example -

{
	"name": "Adward",
	"country": "California"
}

This data is stored in _source field inside the JSON object as you can see below:

{
"_index": "people",
"_type": "_doc",
"_id": "123",
"_version": 1,
"_seq_no": 0,
"primary_term": 1,
"_source": {
"name": "Adward",
		"country": "California"
	}
}

The data is organized within the indices. Because every document within Elasticsearch, stored inside an index. An Index collects all the documents together logically and also provides a configuration option that is related to scalability and availability. So, whenever we need to search for data, execute search queries against the indices.

Elasticsearch architecture is highly scalable due to sharding, unless you are dealing with a large amount of data.

Next TopicElasticsearch Plugin

← prev next →