Splunk Monitoring Files

In this section, we are going to learn about the monitoring of the files and directories in the Splunk. Along with this, we will also learn about how the processor control function, how Splunk tracks the archival files etc.

Monitor files and directories

Splunk Enterprise has three processors for inputting files: monitor, MonitorNoHandle, and upload.

We can use the monitor to add almost all files and directories from our data sources. We may also want to use upload to add one-time inputs, such as a historical data archive.

The hosts running on Windows Vista or Windows Server 2008 and later versions, the MonitorNoHandle input can be used to monitor files that are automatically rotated by the program. The feedback from MonitorNoHandle only works on Windows hosts.

Using any of those methods, add inputs to monitor or upload:

Splunk Web
The CLI
conf

Using either the CLI or inputs.conf, we can add inputs to MonitorNoHandle.

Use the "Set Sourcetype" tab to see how it can index the data from a file.

How the Processor control function

In Splunk, we need to specify a path to a file or directory, and any new data inserted into that file or directory is processed by the monitor processor. This is how we can monitor live application logs, such as those that come from Web access logs, Java 2 Platform Enterprise Edition (J2EE), or .NET applications, etc.

Splunk Enterprise tracks the file or directory and indexes it as new data appear. As long as Splunk web app can read from the directory, we can also define an installed or shared directory, like a network file system. Unless the specified directory includes subdirectories, they are searched recursively by the monitor method for new files, as long as the directories are readable.

Using allow lists and exclude lists, we can include or remove files or folders from being read.

If a device input is deactivated or removed, Splunk Enterprise does not avoid indexing the files that the input references. It just avoids reviewing those files over again. The Splunk web app server must be stopped and restarted to avoid all indexing of data in phase.

How Splunk Enterprise manages file monitoring during reboot

Once the Splunk server is restarted, the retrieval of data is continued where it was left off. It first searches for the specified file or directory in a monitor setup. The control method continuously scans subdirectories of controlled directories.

Inputs to the monitor will overlap. So long as the names of the stanza are different, Splunk Enterprise considers them as separate positions, and files that suit the most similar stanza will be handled according to its settings.

How Splunk Enterprise tracks archival files

Archive files (such as.tar or.zip files) are decompressed before indexing, with support for the following types of archive files:

The .tar file.
The .gz file.
The .bz2 file.
The .tar.gz and .tgz file.
The .tbz and .tbz2 file.
The .zip file.
The .z file.

When we apply new data to an existing archive file, it will re-index the whole file, not just the new data. It can result in the repetition of events.

Why Splunk Enterprise monitors files that rotate the operating system on a schedule.

The monitoring process senses the rotation of log files and does not process the renamed files it has already indexed (except for the archives of .tar and.gz).

Why Splunk Enterprise tracks Windows files which are not writable

Windows can prevent open files from being read by Splunk Enterprise. The MonitorNoHandle input can be used if we need to read files while they are being written to.

Restrictions on file monitoring

Splunk Enterprise cannot track a file that has a path of more than 1024 characters.

Documents with a.splunk filename extension are also not tracked since Splunk metadata is found in documents with that suffix. If we need to index files with an extension of .splunk, use the add-oneshot CLI function.

Why use Batch or Upload?

Select Import into Splunk Web to index a static file once.

We can also use the CLI commands to add oneshot or spool for the same purpose.

If we have a Splunk platform, we can use the batch input form in inputs.conf to once and destructively load data.

The Splunk batch processor is by design located in the $SPLUNK_HOME/var/spool/splunk. If we move a file to that directory, the file is indexed and deleted afterwards.

Why use MonitorNoHandle?

This Windows-only input lets we read files as Windows writes to them on Windows systems. Using a kernel-mode filter driver, it does this by capturing raw data as it is written to file. Use this input stanza on files that open for writing to be locked. This input stanza can be used on a file that the system locks open for writing, such as the log file for Windows DNS servers.

Caveats for using MonitorNoHandle

The following caveats refer to the MonitorNoHandle input:

Only Windows Vista or Windows Server 2008 and later operating systems work with MonitorNoHandle. It doesn't work on an earlier version of Windows, nor does it work on non-Windows operating systems.
MonitorNoHandle can only monitor single files. To monitor more than one file, a MonitorNoHandle input stanza needs to be generated for each file.
Monitoring directories can not be done using MonitorNoHandle.
If a file we choose to monitor with MonitorNoHandle already exists, Splunk Enterprise will not index its current contents, solely new information entering the file as processes write to it.
Monitoring a file with MonitorNoHandle means that the root field for the file is MonitorNoHandle and not the file name. If we want the file name to be the source field, we need to set it explicitly in conf. For more detail, one can have a look at inputs.conf for Tracking files and directories.