Elasticsearch Rollup data
Elasticsearch provides a rollup feature which helps to summarizes the data from indices and rolls it into a new index. This stored data can be used for analysis in future whenever needed but at a fraction of the storage cost of raw data. It is extremely useful to keep the historical data around the analysis. However, it is avoided sometimes because of the financial cost of archiving this massive amount of data.
For instance, consider a system generate 47 million documents per day, which requires lots of memory to store them. Every second, this data is useful for real-time analysis. In case of historical analysis, if we are looking over 5 years of data, we need to work at a large interval such as hourly or daily trends. We can save the large space by compressing these 47 million documents into hourly summaries.
Elasticsearch 6.3 came with rolling up functionality. The rollup feature of elasticsearch is extremely useful for storing the historical data after summarizing it. So that it takes less memory to store after summarization. Note that rollup job is a periodic task.
Getting started with rollup
To start with rollup feature, we have to create one or more "Rollup jobs" to roll up the data. These jobs rollup the indexes that you specify and place the rolled documents in secondary index. You can choose the secondary index of your choice to place these rolled documents. These roll up jobs run continuously in background.
Firstly, we will create an index containing different timestamps. After creating the index, we will create roll up jobs periodically using a cron job. Your document might be like -
By executing the above query of index creation along with a timestamp, you will get the response same as the given below -
Add more documents to rollupexample index.
Create a rollup job
Now we will create a rollup job using _rollup API. These documents are rollup into hourly summaries. The code of rollup data might be looked like -
The cron parameter helps to control the job activates. It controls when and how often the job activates. When a rollup job's cron schedule triggers, it begins the process of rolling up the data from that point where it left after the last activation.
Once the job starts running and has processed some data, we are allowed to use the DSL Query for searching some data. Look at the following query to search the rolled data -