Elasticsearch Aggregation APIs

Elasticsearch provides aggregation API, which is used for the aggregation of data. Aggregation framework provides aggregated data based on the search query. In simple words, aggregation framework collects all the data that is selected by the search query and provides to the user. It contains several building blocks that help to build a complex summary of data. Aggregations generate the analytic information available in Elasticsearch.

Below are some important points of aggregation need to be noted:

Aggregation can be composed together to build complex summaries of data.
It can be considered as a unit-of-work, which makes analytics information over a set of documents that are available in elasticsearch.
It is basically based on the building blocks.
Aggregation function are same as SQL AVERAGE and GROUP BY COUNT functions.
While using aggregation in elasticsearch, we can perform GROUP BY aggregation on any numeric field, but we must be type keyword (keyword is being like an index) or have fielddata = true for text fields.

Look at the figure below, how aggregation look like:

Aggregation Syntax

Basic structure of aggregation -

"aggregation" : {
    "<aggregation_name1>" : {
         "<aggregation_type>" : {
                  <aggregation_body>
              "field " : "document_field_name" 
}

[ , "meta" : { [<meta_data_body>] } ]?
[ , "aggregation" : { [<sub_aggregation>] + } ]?
     }
     [ , "<aggregation_name_2>"  : { . . . } ]*
}

We can use more than one aggregation in one shot.

aggregation - It is an object in JSON that holds the aggregations to compute. You can also use the aggs keyword in place of aggregation.

aggregation_name - Each aggregation has a logical name that is defined by the user. For example, use avg_price for computing average price.

aggregation_type - It is a type of aggregation as each aggregation has a specific name.

aggregation_body - Each aggregation type consists of its own aggregation body, which depends on the nature of aggregation.

field - It is a field keyword.

document_field_name - It is the name of the column name being targeted in a document.

Types of Aggregation

In Elasticsearch, several types of aggregations are available, where each aggregation has its own purpose and output. They are generalized in 4 major families for simplification, which are as follows -

Metric aggregation
Bucketing aggregation
Matrix aggregation
Pipeline aggregation

Metric Aggregation

Metric aggregation is a type of aggregation, which is responsible for keep tracking the metrics. Metric aggregation computes the matrices from the field's values of the aggregated document. It also helps to compute the metrics over a set of documents.

Some aggregations generate numeric metrics, which are either -

Single-valued numeric metric aggregation, i.e., average aggregation or
Multi-valued numeric metric aggregation, i.e., stats

Bucketing

Bucketing is a family of aggregations, which is responsible for building buckets. It does not calculate metrics over the fields like metric aggregation. In this aggregation, each bucket is associated with a key and a document. Bucket aggregation is used to group or create data buckets. These data buckets can be made based on the existing fields, ranges, and customized filters, etc.

Matrix Aggregation

Metrix aggregation is an aggregation that operates on multiple fields. It works on more than one fields and produces a matrix result out of the values, which is extracted from the request document fields. Matrix does not support scripting.

Pipeline

As the name itself suggest, it takes input from the output of other aggregations. In other terms we can say that, - Pipeline aggregations are responsible for aggregating the output of other aggregations.

All these aggregations are further classified, especially bucket, pipeline, and metric aggregation.

Five important aggregations

Some essential aggregations of elasticsearch are described below with example.

Average aggregation
Terms aggregation
Cardinality aggregation
Stats aggregationv

Avg Aggregation

Average aggregation is used to calculate the average of any numeric field in an index. Specify the aggregation name avg in query while creating query. Look at the following example to find the average of field "fees":

Copy Code

POST student1/ _search/
{
  "aggs": {  
       "avg_fees": {
               "avg" : { 
                    "field": "fees"
                }
          }
    }
}

By executing the above code, we will get the average of fees present in documents.

Response

You will get the output like the below response.

{ 
"took": 1251,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
                   "total": {
 "value": 3,
 "relation": "eq"
          },
"max_score": 1,
"hits": [
   {
         "index": "student1",
         "type": "_doc",
         "id": "01",
         "score": 1,
         "_source": {
 "name ": "Denial Parygen",
 "dob": "07/Aug/1998",
 "course": "Mass Communication",
 "Addmission year": "2018",
 "email": "denial@gmail.com",
 "street": "3511 Rodney Street", 
 "state": "Missouri", 
 "country": "United States", 
 "zip": "62208",
 "fees": "24800"
   }
            },
 {
         "index": "student1",
         "type": "_doc",
         "id": "03",
         "score": 1,
         "_source": {
 "name ": "Bob Hana",
 "dob": "13/Sep/1998",
 "course": "BFA",
 "Addmission year": "2019",
 "email": "bob@gmail.com",
 "street": "724 Monroe Street", 
 "state": "Hauston", 
 "country": "United States", 
 "zip": "77063",
 "fees": "18900"
   }
           },
{
         "index": "student1",
         "type": "_doc",
         "id": "02",
         "score": 1,
         "_source": {
 "name ": "Jass Fernandiz",
 "dob": "07/Aug/1996",
 "course": "Bcom (H)",
 "Addmission year": "2019",
 "email": "jassf@gmail.com",
 "street": "4225 Ersel Street", 
 "state": "Texas", 
 "country": "United States", 
 "zip": "76011",
 "fees": "22900"
   }
           }
       ]
   },
   "aggregations": {
           "avg_fees": {
                  "value": "22200"
         }
     }
}

If the field is missing

If the field is not present (for which you are calculating average value) in the document, it gets ignored by default and a null value is returned. You can add a missing field ("missing": 0) in aggregation to consider missing value as default. Execute the following code:

Copy Code

POST new_student/ _search/
{
    "aggs": {  
       "avr_fees": {
               "avg" : { 
                    "field": "fees",
                    "missing": 0
                }
          }
    }
}

Terms Aggregation

The terms aggregation is responsible for generating buckets by the field values. By selecting a field (like name, admission year, etc.), it generates the buckets. Specify the aggregation name in query while creating query.

Execute the following code to search the values grouped by admission year field:

Copy Code

POST student/ _search/
{
   "size": 0,  
    "aggs": {  
       "group_by_Addmission year": {
               "terms" : { 
                    "field": "Addmission year.keyword"
                }
          }
    }
}

By executing the above code, the output will be returned as a group by admission year.

Response

You will get the output like the below response.

{ 
"took": 179,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
                   "total": {
 "value": 3,
 "relation": "eq"
          },
"max_score": null,
"hits": [ ]
},
  "aggregations":  {
         "group_by_Addmission year": {
             "student1",
             "doc_count_error_upper_bound": 0,
             "sum_other_doc_count": 0,
              "buckets": [
              {
      "key ": "2019",
      "doc_count": 2 
 },
 {
      "key": "2018",
      "doc_count": 1
}
                  ]
          }
     }
}

The above query and response will be looked like the below screenshot in elasticsearch-head plugin:

Cardinality Aggregation

It is a common requirement to find a unique value for a field. Cardinality aggregation is helpful for finding unique value for any particular field. It helps to determine the number of unique elements present in an index.

Specify the aggregation name in query while creating query. Execute the following code to find the number of unique values for a field:

Copy Code

POST student/ _search/
{
   "size": 0,  
    "aggs": {  
       "unique_fees": {
               "cardinality" : { 
                    "field": "fees"
                }
          }
    }
}

By executing the above code, the output will return the total number of unique values for fees field present in student index.

Response

You will get the output like the below response.

{ 
"took": 85,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
                   "total": {
 "value": 3,
 "relation": "eq"
          },
"max_score": null,
"hits": [ ]
},
  "total":  {
              "value": 3,
              "relation": "eq":             
       }
       "max_score ": null
        "hits": [ ]
},
"aggregations": {
      "unique_fees": {
      "value": 3
}
                  ]
          }
     }
}

See the below screenshot, how query run in elasticsearch head plugin and responded back -

Stats Aggregation

Stats aggregation stands for statistics, which is a multi-value numeric matric aggregation. It helps to generate sum, avg, min, max, and count in a single shot. When the aggregated documents are large, this aggregation allows to generate all the statistics for a specific numeric field. The query structure is same as the other aggregation.

Execute the following code to find the sum, avg, min, max, and count in a single shot:

Copy Code

POST student/ _search/
{
       "aggs": {  
       "stats_fees": {
               "extended_stats" : { 
                    "field": "fees"
                }
          }
    }
}

Response

By executing the above code, you will get the output like the below response.

{ 
"took": 75,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
                   "total": {
 "value": 3,
 "relation": "eq"
          },
"max_score": null,
"hits": [ ]
},
"aggregation": {
    "stats_fees": {
            "count": 3,
"min": 18900,
"max": 24800,
"avg": 22200,
"sum": 66600,
"sum_of_square": 1496660000,
"variance": 9070000,
"std_deviation": 3011.644,
"std_deviation_bounds": {
       "upper": 2600,
       "lower": 700
                  }
          }
     }
}

Filter Aggregation

The filter aggregation helps to filter the documents in a single bucket. Its main purpose is to provide the best results to its users by filtering the document. Let's take an example to filter the documents based on "fees" and "Addmission year". This will return documents that matched with the conditions specified in the query. You can filter the document using any field you want.

Execute the following code to filter the document which matched with the conditions specified by you in a query:

Copy Code

POST student/ _search/
{
       "query": {  
            "bool": {
                "filter": [
                     { "term": { "fees": "22900" } },
                     { "term": { "Addmission year": "2019" } },
                ]
          }
    }
}

Response

By executing the above code, you will get the output like the below response.

{ 
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
                   "total": {
 "value": 1,
 "relation": "eq"
          },
"max_score": 0,
"hits": [ ]
{
          "index": "student",
         "type": "_doc",
         "id": "02",
         "score": 1,
         "_source": {
 "name ": "Jass Fernandiz",
 "dob": "07/Aug/1996",
 "course": "Bcom (H)",
 "Addmission year": "2019",
 "email": "jassf@gmail.com",
 "street": "4225 Ersel Street", 
 "state": "Texas", 
 "country": "United States", 
 "zip": "76011",
 "fees": "22900"
                  }
             }
         ]
     }
}

The above query and response will look like the below screenshot in elasticsearch head plugin -