Splunk – Quantitative Finance Journey – The Beginning

Hello again, it has been quite a while since last posting as I have been busy with a lot of changes. I have worked for Splunk for over a year now and I am really enjoying it. Great company, awesome technology, and a bunch of smart energetic people.

As always I like to post about unique items I feel may be useful to others. I have been working on something that might fit the bill and I am anxious to see if it may be of interest to others. Over the past 24 months or so I have been studying investing, trading, and quantitative finance. Concurrently, I have also been working to become more proficient with Splunk. I like to combine activities to gain momentum so I decided stock market and economic data would be the perfect way to dig deeper into Splunk and hopefully improve my investing/trading. In the beginning I only looked at it as a way to learn more about Splunk while using data that was interesting to me. However, as I dug in I found the Splunk ecosystem and world of Quantitative finance have a lot of similarities. The primary ones being lots of data, Python, and machine learning libraries.

In the world of quantitative finance, Python is very widely used. In fact, Pandas, a commonly used Python library was created in a hedge fund. The Python libraries used in quantitative finance are substantially the same libraries provided in the Python for Scientific Computing Splunk App. Additionally, much of the financial and market data provided by free and pay sources is easily accessible via REST API. Splunk also provides the HTTP Event collector (HEC), which is a very easy to use REST endpoint for sending data to Splunk. This makes it relatively easy to collect data from web API’s and send to Splunk.

I promise I will get to a little meat in this post, but I would like to provide some background. I am starting the second iteration of a Splunk app and set or data load/sync scripts. I plan to write about my journey, the code and the solution along the way. I hope to get some feedback and find out if this Splunk app would be desirable to others. If so we’ll see where it goes.

When starting to do trading research I found there were various places to get market and economic data. Places like the Federal Reserve (FRED), the exchanges, the census, the bureau of economic analysis, etc. In the end I found I could get most of the core data I wanted from three places;

  • Federal Reserve Economic Data (https://fred.stlouisfed.org/) – FRED is an economic data repository hosted and managed by the Federal Reserve Bank of St. Louis.
  • Quandl (https://quandl.com) – This is a data service that is now owned by NASDAQ and features many free and pay sources for market and economic data. There are various sources like this, but this I chose to start here as it fit my need and budget.
  • Gurufocus (https://www.gurufocus.com) – This is a site with pay and free resources but offers some great fundamental data available via REST API to subscribers.

The sources are endless and only limited by your imagination and your wallet as some data is very expensive. The main data most people will start with is end of day stock quote data and fundamental financial data. This is exactly what I get from quandl and gurufocus, as well as the macroeconomic data from FRED. There are lots of ways to get data into Splunk, but my preference in this case was to use Python code and interact with the internet REST API’s, Splunk REST API’s and HEC. This allows me to have Python scripts control all of my data loads and configuration in Splunk. Splunk also provides an extensible app development platform which can be used to build add-ons for data input. I will likely move my data load processes to this model in the future.

The other aspect that Splunk brings is the ability to integrate custom Python code via the Machine Learning Toolkit (MLTK) as custom algorithms. This provides the ability to implement analysis, such as concepts from modern portfolio theory for risk optimization and return projection. Additionally, this gives us a path to do more advanced things using the MLTK. I have only scratched the surface on this subject and I have lots of ideas to explore and learn in the future. Splunk simplifies operationalizing these processes and in my opinion makes the task of getting from raw data to usable information much easier.

Ok, hopefully that provides enough background and context. Now I would like to show an example of the following process.

  • Use Python to download end of day stock quote data from quandl.com using their REST API.
  • Use Python to send the data to Splunk via the HTTP Event Collector.
  • Use Splunk to calculate the daily returns of a set of stocks over a period of time.
  • Utilize the Splunk Machine Learning Toolkit to calculate correlation of the stocks based on daily returns.

The following code sample shows a simplified version of code used to retrieve data from the Quandl Quotemedia end-of-day data source. The returned data is formatted and sent to a Splunk metrics index. Splunk metrics were created to provide a high performance storage mechanism for metrics data. Learn more about Splunk metrics here and here.

Once the quote data is loaded then we can see all of the metrics loaded by the process. The following screenshot shows our resulting indexed data.

EOD_DATA

Now that we have our data loaded we can do some more advanced processing. A common fundamental calculation done in quantitative finance using modern portfolio theory is to calculate daily returns. The following example shows how to use the metrics data loaded into Splunk for this calculation. For this example I have loaded data for various S&P 500 sector ETF’s as well as a gold miners ETF. Here is the calculation and results.

Daily_Return

The next step in our process is to use the Splunk Machine Learning Toolkit to calculate correlation of our equities. The Python Pandas library has a function that makes this process very easy. We can access that functionality and easily operationalize that process in Splunk. It just so happens there is a Correlation Matrix algorithm in the GitHub Splunk MLTK algorithm contribution site available here. The documentation to add a custom algorithm can be found here and you will notice this Correlation Matrix example is highlighted. Here is the example of using this algorithm and the corresponding output.

Correlation_Matrix

The example above shows the correlation of all of the examined ETF’s over a period of 60 days. The value of 1 is perfectly correlated and the value of -1 is perfectly inversely correlated. As noted previously this calculation is the basis for more advanced operation to determine theoretical portfolio risk and return. I hope to visit these in future posts.

Regards,

Dave

EMC VNXe Performance PowerShell Module

vnxePoSH

I thought I would revisit the VNXe performance analysis topic. In 2012 I published some posts around performance analysis of an EMC VNXe storage array. This information applies only to the 1st generation VNXe arrays not to the newer 2nd generation arrays.

In my previous articles I posted a PowerShell module to use to access the VNXe performance database. Since that time I fixed a bug around rollover of timestamps and made a couple other small improvements. The module has also now been published to GitHub.

Here is some background and you can see my previous posts for additional information.

http://muegge.com/blog/emc-vnxe-performance-analysis-with-powershell/

http://muegge.com/blog/emc-vnxe-performance-analysis-with-powershell-part-ii/

The VNXe collects performance statistics in a sqlite database, which can be accessed via scp on the array controllers. It is also available via the diagnostic data package, which can be retried via the system menu in Unisphere. There are a few database file which hold different pieces of performance data about the array. The MTSVNXePerformance PowerShell module provides cmdlets to query the database files and retrieve additional performance information over what is provided in the Unisphere GUI.

I will show some examples of using the module to get performance information. The first one is a simple table with pool capacity information.

The first step is to load the modules, set file path variables, and the sqlite location. This also uses a module to provide charting from the .Net MSChart controls.

The next step is to get some data from the VNXe sqlite tables.

This give us the information we need about pools. So now we can look at rollup Information by using a command like so.

Next we will look at IOPS from the dart summary data. Data is stored in different tables based on type of information and time period. As data is collected it I summarized and moved to historical tables storing longer time periods at less data resolution. Here we are going to get dart store stats which gives us all IO information for each of the data movers

This produces the following charts using the MSChart .Net charting controls.

The module can be used to produce complete VNXe performance reports like the one below.

The script that produces the report above is included in the examples folder in the GitHub project.
 

I recieved a fair amount of interest on the first posts around this topic. I hope this update and refresher is still useful to some folks.

Regards,

Dave

PowerShell Meets Xplorer2 for an ESXTOP Relog

This is a topic I have been meaning to write about for a long time. I was recently working with this scenario and thought it would be a good example. First I am going to get a little nostalgic to provide some context. Back in the late 80’s and early 90’s in my first days of computing working in the DOS world, a favorite utility of mine was a file manager called Norton Commander. This was a very feature rich text based dual pane file manager. Here is a screenshot, I hope it brings back some good memories. If this does not look familiar then hopefully there is some historic or comic value.

NC

When Windows 3.0 was introduced and the primary interface became the program manager I could not believe it. Who wanted to run a computer using pictures, how absurd.J So I surrendered my beloved Norton Commander and was forced to use the wonderful Windows File Manager. Here is a screenshot so you too can relive the feature deficits.

WinFile

Of course this became Windows Explorer, which we all know and settle on using. I always wanted a file manager with that familiar feel of the Norton Commander dual pane interface, but always settled for Windows Explorer. When I started working with PowerShell several years ago my need for a better file manager became apparent. I searched and found an application xplorer2 which had the dual pane look and feel I was looking for with a lot of customizability. It turned out to be an excellent complement to PowerShell. OK, so there’s the point of the nostalgia.

I am going to talk about a few different topics in this post, but my goal is to provide a real world example of using PowerShell with xplorer2. Here is the xplorer2 interface in dual pane configuration as I use it. It can be customized extensively and I will not go into many of the features and options. This is not meant to be an xplorer2 advertisement; I am just a satisfied customer. Check it out here http://zabkat.com.

X2-A

This application can be used to enhance the navigation and launching of scripts and PowerShell is a great example. The application has the ability to create bookmarks with keyboard shortcuts, custom columns, folder groupings and other various helpful file and folder stuff. IMHO, the best features of the application which complement PowerShell are user commands coupled with keyboard shortcuts and $-tokens. This allows a powerful way to launch PowerShell scripts and feed data into the scripts.

Here is an example. I have some ESXTOP CSV performance files that I need to merge. There are certainly several ways to do this and it could be done by manipulating the text files. The ESXTOP file is a standard PDH format .csv file and can be read and manipulated by many tools including a windows command line tool called relog.exe. This tool is found on Windows XP systems and above and is used to manipulate any standard PDH format performance files. This tool can be used to do a variety tasks to the files. Here is the help text.

This command will be used in PowerShell scripts to create an easy tool for converting and merging performance logs. The first step to make this work in the xplorer2 environment is to setup the user commands. The screenshot below shows the user commands menu and functionality of the application.

X2-B

The organize dialog lets you create and customize the commands and define keyboard shortcuts.

X2-C X2-D

Here are some examples of commands I use all the time.

C:windowssystem32WindowsPowerShellv1.0powershell.exe -noexit $F

The above command runs the currently selected PowerShell script. The $F is a token in the xplorer2 environment which represents the currently selected file on the left pane. Simply select a PowerShell script and use the alt-0 keyboard shortcut.

C:Elevationelevate.cmd C:windowssystem32WindowsPowerShellv1.0powershell.exe -noexit $F $R

The above command runs the currently selected PowerShell script with the right visible directory path as an argument –The $R is a token in the xplorer2 environment which represents the right side visible directory path. The command also uses the old elevate VBScript to get an admin window. I welcome someone to clue me in on a better way to do this.

C:Elevationelevate.cmd C:windowssystem32WindowsPowerShellv1.0powershell.exe -noexit $F $G

The above command runs the currently selected PowerShell script on the left with the inactive highlighted file on the right as the argument.

C:Elevationelevate.cmd C:windowssystem32WindowsPowerShellv1.0powershell.exe -noexit $G $A

The above command runs the inactive highlighted PowerShell script on the left with the currently selected files on the right as an argument.

I will go back to our ESXTOP example to help make things more clear. In the example below I have multiple esxtop files from a host I would like to merge. The relog application will merge binary logs very easy so our first step is to convert to binary.

RL-A

The screenshot above shows we have the PowerShell Script to do the conversion highlighted on the left and the files to be converted selected on the right. We just press the alt-4 keyboard shortcut which launches the script and it converts our files for us using relog.

Here is an example of the script and the output.

RL-B

Here are all of our converted files ready to be merged. The files are sorted by extension and the binary files are selected to be run against the merge script highlighted on the left.

X2-E

The alt-4 keyboard shortcut is selected to run the script; relog merges the files and outputs in CSV format ready for further analysis.

RL-C

X2-F

We now have a merged file ready for further analysis in Windows perfmon or other tools.

One item which is worth mentioning is the use of the $Args variable. In most cases it would be recommended to use PowerShell parameters rather than $Args. Although, in this case it provides a simple method to utilize the $-tokens functionality of xplorer2.

I have found PowerShell and xplorer2 used together to be a very useful combination. I hope others will find this concept useful.

Regards,

Dave

Using the Isilon 7.0 ReST API with PowerShell

EMC recently released Isilon 7.0 “Mavericks” version of the OneFS operating system. This release has many great new features, which you can read all about here and here. One of these great new Isilon features is the ReST API, which allows programmatic access to the platform. If you are not familiar with ReST, it stands for Representational State Transfer. This is a lightweight, platform independent and stateless method of programming web services.

PowerShell allows an easy method to access the Isilon ReST API. Working with ReST is a new for me, but I thought it might be useful for some to follow along while I am learning. Also, if anyone has tips for me on this process I welcome the knowledge.

The Isilon ReST API is not enabled by default. To enable the functionality it requires changing options on the HTTP settings page in the protocols section, see below.

Isilon_HTTP_Settings

The HTTP interface can use active directory authentication, but in this post I will use basic authentication and show examples of reading data from the cluster. I hope to show more advanced examples as I learn.

PowerShell v3 has some great built-in functionality for working with ReST API’s. The Invoke-RestMethod cmdlet is exactly the functionality required to leverage the Isilon ReST API. The first challenges when working with the API will be related to authentication and certificates. The Isilon cluster will use a self-signed certificate by default. This results in a certificate error when connecting via HTTPS and can be seen when connecting to the Isilon cluster via a browser. The following code will allow a work around to the problem by ignoring the error.

In a production environment the correct way to handle this would be to install a certificate issued by a trusted certificate authority. The next step is to setup a proper HTTP header for basic authentication.

Once this is complete all we have to do is build the proper URL and issue the request. The code below will retrieve and display the SMB and NFS settings of the cluster.

The output from the above examples is shown below. As you can see this gives a quick concise view of the protocol settings.

While this is only a simple example of retrieving data from the cluster, the possibilities are endless. When considering where we are in the transformation to cloud and automation. This type of enabling technology will be the foundation of great things to come.

Stay tuned…

Regards,

Dave

Merge Multiple EMC NAR files with PowerShell

While working on a project the other day I found the need to merge multiple NAR files. The NaviSecCLI provides a method to merge two NAR files but does not allow an option to merge multiple files. I was searching on the web for methods to do this and ran across a couple of scripts.

The first script I found was done in VBScript http://blog.edgoad.com/2011/03/merging-multiple-emc-nar-files.html

The second script I found was bash for linux http://jslabonte.wordpress.com/2012/02/01/how-to-merge-nar-files/

I thought this is something that PowerShell can do much easier so here is a script to merge multiple NAR files. This script will require the NaviSecCLI to be installed to work properly.

I hope someone finds this useful.

Regards,

Dave

EMC VNXe Performance Analysis with PowerShell Part II

I appreciate the positive feedback I have received from the VNXePerformance module so far. I thought I would add to it and provide a script to generate a basic report. The script can be downloaded here.

The script will produce an HTML report and associated graphics with the following information.

  • Capacity Information system and pools(Total and Allocated)
    • Maximum, Minimum, Average, Median
    • Historical graphs for system and each pool
  • Bandwidth usage per protocol
    • Maximum, Minimum, Average, Median
    • Historical graphs
  • IOPS usage per protocol
    • Maximum, Minimum, Average, Median
    • Historical graphs

Here is a sample

The previous post used PowerGadgets for the charting functionality. This tool is not free and it is also not yet supported with PowerShell 3.0. To correct this issue I provided a function in this reporting script which uses the charting functionality in the .Net 4.0 framework. While this fixes the two issues mentioned it does require more work to use, but it will work well for our purposes here. This script uses the VNXePerformance.ps1 module from my previous post and a few new functions to produce an html report and associated graphic files. A command line example to run the script is shown below.

The script uses data provided by the VNXePerformance module and the functions in the script to format and write the report data. Here is a brief description of the functions used.

Out-DataTable – this function is used to convert the PSObject data provided as output from the module functions to the system.data.datatable type. This is required for databinding to produce charts.

Out-LineChart – This function provides chart generating functionality to produce a line chart based on provided datatable and generate a .png graphic file.

Get-SeriesRollup – This function creates summary data (maximum, minimum, average, median) for series data.

The following functions create HTML report output

  • ConvertTo-SeriesRollupHTML
  • Write-ChartHTML
  • Write-BlankHTMLTable
  • Write-HeaderHTMLTable

The first part of the script defines parameters, loads charting assembly, contains the functions declarations and module import.

The next portion of the sets the location of the SQLite database and begins the HTML report string.

The next portion of the script completes the report by using the VNXePerformance module to retrieve object data then output HTML using the script functions.

The final portion of the script closes out the html file and writes it to disk.

This should provide a good starting point to use for reporting. It has much room for improvement. Everyone please comment with information discovered about the SQLite data and information added to the report.

Start-VNXeHTMLPerformanceReport.zip
Regards,

Dave

HDS AMS 2000 Storage Resource Reporting with PowerShell

Welcome!

I have been creating a few PowerShell scripts for use with the HDS AMS 2000 array. One thing I found I needed was a quick way to look at DP(Dynamic Provisioning) pools, raid groups, and LUNs. I also wanted to be able to see the associations and filter easily. Since I am working on a new deployment I have been creating Raid Groups and Luns often. I needed a quick way to see what I currently had while creating new resources.

I created a PowerShell script that would quickly show existing resources by raid group or DP pool. It also uses nickname info for the devices that are maintained in three csv files(LU_Nicknames.csv,RG_Nicknames.csv,DP_Nicknames.csv). These are simple comma delimited text files which contain the ID and nickname of each resource. The files are updated as storage resources are added. This allows me to easily identify the resources and to filter for specific devices.

The script executes three HSNM2 CLI commands and reads the information into object form. The LUN information is then shown grouped by raid group or DP pool.

Here is the output with the nickname search parameter set to “DB”. This will return all database resources based on the naming standard. If this is left null it will return all resources.

Here is the script:

The script uses the start-session.ps1 file to establish connectivity with the HDS array. Additional information regarding the use of this include file can be found at this post. Then the script executes HSNM2 CLI commands to return information on DP pools raid groups and LUNS. The script uses regular expressions to parse the output and convert it into objects. It also reads in the nickname files and adds the data to the custom objects.

The objects are then output using the built-in PowerShell formating engine with a little custom formating thrown in for the group headers.

Here is an example nickname file:

I suppose this may not be necessary with the use of Device Manager, but I am still learning it and I could not quite get this view with it. Besides I am more of a scripting kind of guy. I also really like the output of this script as it gives me just the view of the array I need when I am allocating new storage and setting up new resources. I use this script in conjuction with two other scripts for creating LUN’s and raid groups. I plan to post those scripts soon.

Hope this helps,

Dave

Exchange DB Reporting with PowerShell and Log Parser

I ran across a useful post today as I was roaming through Google Analytics.

Using PowerShell, LogParser and PowerGadgets to get Exchange 2003 storage information – Part 1

Wes Stahler uses Log Parser and PowerShell to report on the free space in an Exchange Database.

This is a task I have done in the past. I will add this script to my toolkit.

Regards,

Dave

ADAM Administration with SharePoint and PowerShell

Welcome!

Recently I worked on finding a simple way to create a web based administrative interface for an ADAM directory. The requirements were to create a simple web based interface to allow business personnel to manage users and groups for an application directory. It was also desirable if this solution would easily integrate with SharePoint.

After doing a little searching on the web, I found a combination that fit the bill.

The Quest AD Management Shell CmdLets – This is a PowerShell Snap-In that allows administration of AD and ADAM. The cmdlets are from Quest Software you can find more info here. I have used them in other scripts and they have come in very handy. To make these work in this solution from SharePoint, the Quest Snapin .dll and it’s dependants need to be copied to the global assembly cache and entered as a safecontrol in the sharepoint web.config.

The iLoveSharePoint PowerWebPart 3.0 – This is a web part which allows the execution of PowerShell code from the web part. This web part is from the CodePlex project iLoveSharePoint by Christian Glessner. I was impressed with this web part. It is easy to install and configure and relatively simple to use.

The PowerWebPart allows you to execute scripts that will render asp.net web controls in the web part. This allows you to retrieve user input from the controls to use as script inputs. The possibilities are endless. For my purposes I only needed a very simple user interface.

I wanted a way to use this for different ADAM partitions so I tried to allow for different configuration scripts. The design I decided on consisted of three levels of scripts one for configuration one for data access and one UI script for each web part.

The code sample below is the configuration and connection script. This script defines the user and group containers and the directory connect and disconnect functions.

The next script is the function library for data access to the ADAM directory.

The next script is an example of a UI script for the web part. When a new PowerWebPart is created a template script is added by default. This script provides a framework and some sample code. Christian also has an add-on which allows you to use PowerGui to edit your script from SharePoint. The entire solution contains one script similar to this for each web part.

The screenshot below shows the complete solution. This method was simple, effective and easy to create. I dot sourced the corresponding web part script and the connection script in each web part.

This is a pretty quick and easy way to expose some simple administrative or user functionality on a SharePoint Intranet.

I hope this helps.

Regards,

Dave

Connection History with PowerShell and NetStat

Welcome!

This is a little trick some might find useful. I was working on decommissioning some servers and I needed a way to find out what was connecting to these machines. I decided to create a script to log connections. I have done this in the past in various ways which usually involved logging a bunch of data and then querying against it to find the unique connections.

This time it finally occurred to me, just filter the data as it is being collected. So I set out to write a PowerShell script that would keep a running list of client TCP connections to a given machine. This information would be stored in a text file.

The first step was to collect the information and put it into a PowerShell object.

Then the next step was to read the file with the previous information and add it to the PowerShell object.

We can now remove the duplicates from the combined information and save the updated file.

We can run this script in a scheduled task at whatever interval is required. Now we have a log of unique inbound TCP connections.

Best Regards,

Dave