Splunk Search Optimization

Optimizing the search is a strategy that lets the quest run as effectively as possible. In this section, we will learn how to optimize searches on the Splunk platform

A search also runs longer when not configured, retrieves more enormous quantities of data from the indexes than is required, and inefficiently consumes more memory and network resources. Multiply these problems across hundreds or thousands of searches, and the result is slow or sluggish.

There's a collection of fundamental concepts we can obey to maximize our searches.

  • Retrieve only the required data
  • Move as little data as possible
  • Parallelize as much work as possible
  • Set appropriate time windows

We are using the following methods to incorporate search optimization concepts.

  • Filter as much as possible in the initial search
  • Perform joins and lookups on only the required data
  • Perform evaluations on the minimum number of events possible
  • Move commands that bring data to the search head as late as possible in our search criteria.

Indexes and lookups

The Splunk program uses the information in the index files to classify the events that can be retrieved from the disk when we run a scan. The lower the number of events to be retrieved from the disk, the faster the quest is going.

How we build our quest can have a huge effect on the number of retrieved events from the disk.

When data is indexed the data will be translated into events based on time

The processed data consists of several files:

  • The raw data in compressed form (rawdata)
  • The indexes that point to the raw data (index files, also referred to as tsidx files)
  • Some metadata files

These files are written to the disk and reside in age-organized directory sets called buckets.

Use indexes effectively

One way of limiting the data extracted from the disk is to partition data into different indexes. If we rarely scan multiple data types at a time, partition various data types into separate indexes. Limit our searches to the specific index. Store data about web access in one index, for example, and firewall data in another. For sparse data, it is suggested to use different indexes, which otherwise may be lost in a large amount of irrelevant data.

An optimized search

We can optimize the entire search by moving some of the components from the second search to locations earlier in the search process.

Moving the criteria A=25 before the first pipe filters the events earlier and reduces the amount of times that the index is accessed. The number of events extracted is 300,000. This is a reduction of 700,000 compared to the original search. The lookup is performed on 300,000 events instead of 1 million events.

Moving the criteria L>100 immediately after the lookup filters the events further reduces the number of events that are returned by 100,000. The eval is performed on 200,000 events instead of 1 million events.

The criteria E>50 is dependent on the results of the eval command and cannot be moved. The results are the same as the original search. 50,000 events are returned, but with much less impact on resources.

Quick tips for optimization

The key to fast searching is to limit the data to the absolute minimum that needs to be pulled from the disk. In the search, filter the data as early as possible, so processing takes place on the minimum amount of data needed.

Limit the data from disk

The techniques for restricting the amount of data retrieved from the disk range from setting a narrow time frame, being as precise as possible, and retrieving the smallest required events.

Narrow the time window

Limiting the time span is one of the most powerful ways to restrict the data that is taken off disk. Use the picker time range or specify time alterers in our search to identify the smallest time window necessary for our search.

If we need to view data from the last hour only, don't use the Last 24 hours default time range.

If we must use a broad time range, such as Last week or All-time, then use other techniques to limit the amount of data retrieved from disk.

Specify the index, source, or source type

To optimize our searches, it's necessary to understand how our data is structured. Take the time to learn which indexes contain our data, our data sources, and the type of source. Knowing the data regarding this information lets, us narrow down the searches.

  1. Run the following search.
    Search=*
    This search is not optimized, but it provides us with an opportunity to learn about the data we have access to.
  2. In the Selected fields list, click on each field and look at the values for host, source, and sourcetype.
  3. In the Interesting fieldslist, click on the index Look at the names of the indexes that we have access to.

In our quest, define the index, source, or source form where possible. When the Splunk program indexes data, it will automatically add a number of fields to each case. Fields of the index, source, and source type are automatically added as default fields to each event. A default field is an indexed field recognized by the Splunk program at search time in our case. The host and the source, and source type fields describe where the event originated.

Write better searches

This topic examines some of the causes of slow searches and includes guidelines to help us write more efficient searches. Several factors, including: can influence the pace of our searches

  • The volume of data that we are searching
  • How our searches are constructed
  • The number of concurrent searches

To optimize the speed at which our search runs, minimize the processing time required for each component of the search.

Know your type of search

Search optimization guidelines depend on the type of search we are running and the characteristics of the data we are looking for. Searches fall into two categories, which are based on the objective we wish to accomplish. A search is intended to retrieve events, or a search is designed to produce a report summarizing or organizing the data.

Searches that retrieve events

Raw event searches retrieve events from a Splunk database without any further processing of the retrieved events. When picking up events from the index, be clear about the events we want to imagine. This can be done with keywords and field-value pairs unique to the events.

If the events in the dataset we want to retrieve occur frequently, the search is called a dense search. If the events in the dataset that we want to retrieve are rare, the search is called a sparse search. Sparse searches that run against large data volumes take longer than dense searches for the same data set.

Searches that generate reports

Report-generating searches, or transforming searches, conduct additional analysis on events after retrieval of the events from an index. This processing can include filtering, transforming, and other operations using one or more statistical functions against result collection. Since this processing takes place in memory, the more restrictive and precise we retrieve the events, the faster the search will run.

Tips for tuning your searches

In most cases, because of the complexity of our query, our search is slow to retrieve events from the index. For instance, if our search contains extremely large OR lists, complex sub searches (which break down into OR lists), and phrase search types, processing takes longer. This section explores tips to fine-tune our searches to make them more successful.

It takes a lot of memory to conduct statistics with a BY clause on a set of field values that have high cardinality, lots of uncommon or special values. One potential solution is to lower the value for the chunk size setting used with the command tstats. Additionally, it can also be beneficial to reduce the number of distinct values that the BY clause must process.

Restrict searches to the specific index

If we rarely search over more than one data type at a time, divide the different data types into separate indexes. Limit our searches then to the same index-Store Web access data, for example, in one index, and another firewall. This is recommended for sparse data, which could otherwise be buried in a large volume of unrelated data.






Latest Courses