File System Assessment with PowerShell and Splunk

It has been a while since I posted, but here goes. I have been experimenting with Splunk to make one of my old processes better and easier.

In the past I have done file system assessments for customers to provide capacity planning information as well as usage patterns. There are a number of ways I have collected data depending on if the platform is Windows, Linux, or a NAS device. In this post I will focus on collecting file system metadata from a Windows file server. To do this I use a PowerShell script to collect the data and dump to a CSV file. Historically I would use the CSV output and load it into SQL Server as the first step. Then I would use the SQL connectivity and pivot charting functionality in Excel for the reporting. It occurred to me as I have been working with Splunk that I could improve this process using Splunk.

Another thought also occurred to me, this process could be performed by owners of Splunk with no impact on licensing costs. Splunk is licensed on daily ingest volume and that volume can be exceeded 3 times per month without penalty. File system assessment data is something that would typically only be collected on a periodic basis so this data could be ingested without increasing the daily license capacity. Using the methods I show below organizations who own Splunk could easily do free periodic file system assessments.

The first step is to collect the file system metadata using the following PowerShell script.

The script requires a single argument which specifies the path to an XML file. This XML file is used to define configuration for the script. Here is an example of how you would call the script.

The XML configuration file defines the file system(s) to scan. Here is an example:

The example includes an explanation of optional attributes for the XML element(s). This allows control of how the data is organized and tagged, which provides more useful reporting options. The sample also shows several configuration examples and some example output. Once the metadata is collected into CSV files. It can be easily loaded into Splunk using the ad-hoc data load feature or a file system input on a forwarder. Here is an example of a file metadata record/event in Splunk.

Splunk Record

One thing to note here is the event timestamp in Splunk. The Time field is derived from the modified time of the file. This was done on purpose, because in my experience doing file system assessments it is the only timestamp that is generally accurate. I have found many cases where last accessed is not available or incorrect. I have also seen many cases where the create date is incorrect. Sometimes the create date is more recent than the modified date and occasionally it is even in the future. Here is the Splunk props.conf sourcetype stanza I defined for the data. It includes TIME_FORMAT and TIMESTAMP_FIELDS to use the modified date for the _time field. It also uses MAX_DAYS_AGO since the modified date can go back many years.

Once the data is loaded into Splunk here is the type of information we can easily find.

Splunk Dashboard

These are just some simple charts, but the metadata provides many reporting options. There are benefits of using Splunk besides the fact that it can be done without additional cost.

  • This eliminates the need to create tables and/or ETL processes for a relational database
  • The data can be loaded very easily compared to using a relational database
  • The dashboard can be reused very easily for new reports. Simply use a dedicated index and clean or delete/recreate as needed for updated reporting

If I were doing this in an environment I managed on a day to day basis. I would send the data directly to Splunk via the HTTP event collector. I’ll need to modify the collection code a bit to provide an example, but I’ll try to post a follow-up with that info.

I hope some folks find this useful.

Regards,

Dave