Splunk Sub Searching
In this section, we are going to learn about the Sub-searching in the Splunk platform. The sub searching is a very important part of the Splunk searching to search the data effectively in our data pool. We will learn about how to use the se searching with the help of different examples and also how we can improve our sub searching and how easily we can do sub searching.
Use a subsearch
A subsearch is a search used to narrow down the range of events we are looking on. The subsearch result will then be used as an argument for the primary, or outer, search. In the main search, sub searches are enclosed in square brackets and assessed first.
Let's find the single most common shopper on the online store Buttercup Games, and what that shopper has bought.
The following examples show why it is useful to do a sub search. Example 1 illustrates how to locate the shopper most often without sub searching. Example 2 shows how a sub search might find the most frequent shopper.
Example 1: Search without a subsearch
We want to find the single most frequent shopper and what that shopper has purchased on the online store Buttercup Games. We will use the top Command to return the most persistent shopper.
sourcetype=access_* status=200 action=purchase | top limit=1 clientip
Here, the limit=1 argument specifies to return 1 value. The clientip argument specifies the field to return.
In Splunk, this search returns one clienttip value, 22.214.171.124, to identify the VIP shopper. The search also returns a percent and a count. These are the default fields in which the top Command returns.
sourcetype=access_* status=200 action=purchase clientip=126.96.36.199 | stats count, distinct_count(productId), values(productId) by clientip
This search uses multiple statistical functions with the command Stats. An alias for the function separct count() is dc().
This search uses the count () function to give the VIP shopper the total number of purchases. The dc () function is the function separct count. Use this feature to count the number of different or unique products the shopper has purchased. The values function is used as a multivalue field to show distinct product IDs.
The downside to this method is that we will run two searches each time we want this table to be built. In every time range, the top buyer is not likely to be the same person.
Example 2: Search with a sub search in the Splunk
We will start our first requirement to identify the most frequent single shopper on the online store Buttercup Games.
sourcetype=access_* status=200 action=purchase | top limit=1 clientip | table clientip
Here, this search returns the clientip, clientip=188.8.131.52, for the most frequent shopper. This search is nearly identical to the search in step 1 of Example 1. The difference is the last piped instruction, table clientip, which shows the details about the clientip in a row. Because with the table command, we specified only the clientip field, that is the only field that was returned.
From the output, the count and percent fields produced by the top Command are discarded.
We run a search on the same data to see what the shopper has ordered. We provide the result of the most common search for shoppers as one of the search criteria for the purchases.
In Splunk, a sub search is enclosed in square brackets  and evaluated first when reading the search criteria.
sourcetype=access_* status=200 action=purchase [search sourcetype=access_* status=200 action=purchase | top limit=1 clientip | table clientip] | stats count, distinct_count(productId), values(productId) by clientip
Here, in this code, because the top Command returns the fields of count and percent, the table command retains only the clientip value.
If we run it at the same time range, these results will match the results of the two searches in Example 1; If we adjust the time period, we may see different outcomes as it would be different for the top purchasing customers.
Note: This sub searches performance depends on how many distinct IP addresses match status=200 AND action = purchase. The top Command will have to keep track of all those addresses before returning the top 1 with an impactful performance if there are thousands of separate IP addresses. Sub-searches can return a maximum of 10,000 results by default and have a maximum runtime of 60 seconds. In large manufacturing environments, the sub search in this example may be time-out before it is complete. The best option is to rewrite the query in order to limit the number of events the sub search will have to process. Alternatively, the maximum results and the maximum runtime parameters may be increased.
Make the search syntax easier to read.
It can be challenging to read sub searches and lengthy complex searches. To make the search syntax easier to read in the Search bar, we should add auto-formatting to the search syntax. To apply auto-formatting to search, use the following keyboard shortcuts.