Splunk and Eureqa; IOT Manufacturing Demo

Introduction

I have recently been working with the Splunk universal machine data platform. It is a very interesting and useful technology with a wide range of application. I have also been working more with Eureqa and decided to explore how the two technologies could be used together. I decided to use a hypothetical Internet of Things manufacturing scenario to setup a demo using Splunk and Eureqa.

This is a simplified version of a scenario becoming more common in Manufacturing and other industries today. The idea is data is collected about various aspects of the manufacturing process as well as completing quality testing at various stages of the process. The results of the quality testing is used in conjunction with the data collected during the manufacturing process as a training data set to build predictive algorithms/models. These models are then used in real time during the manufacturing process to measure quality and identify defects. This allows for the real time actions to correct quality issues or to stop the manufacturing process to reduce costs incurred by defective product.

Splunk and Eureqa fit in this process like peanut butter and jelly. Splunk is a universal machine data platform that will literally ingest any type of data. It indexes the data and provides a search language to produce valuable information in real time. Eureqa is a robotic data scientist and can find meaningful predictive models automatically from your data. These tools can be used to easily solve the real time IOT manufacturing quality use case. Here is the demonstration scenario:

  • Chemical Manufacturing – (PolyVinylHydroSilica)
  • Problem: Manufacturing data is not timely enough
    • Decisions cannot be made during manufacturing process
    • Data reporting is only looking historically and lags 24 to 48 hours
  • Objective: Improve quality and meet the following goals
    • Improve data collection on manufacturing process
    • Utilize data to accurately predict and identify failures
    • Identify failures as early as possible to reduce costs
  • Identified high level requirements
    • Track manufacturing process data
    • Identify quality issues quickly to reduce costs
    • Provide near real time reporting

 

Legacy Process Flow

Legacy Process

 

Updated Process Flow with Splunk and Eureqa

Updated Process

Output Details and Operation

The initial benefit of this process and the low hanging fruit that will allow for a positive return on investment is the savings from halting production early on lots identified as inferior product. The second valuable benefit is the real time reporting of manufacturing process metrics. Splunk has real time reporting capabilities which are used to drive dashboards with valuable information. Here is an example below.

Splunk Dashboard

The above dashboard shows key performance indicators on the manufacturing process such as; speed, volume, and cost of manufacturing as well as the cost saved by stopping failed lots. This information is driven by the manufacturing data from each phase as well as the model results from Eureqa. The models in Eureqa are generated from the training data using the search functionality. The screenshot below shows the candidate models produced from the search.

Eureqa Models

We select models for each phase based on variables included and the performance. Also note Eureqa suggests the best model based on best tradeoff of error vs complexity. I chose to use model 2 for phase I, model 5 for phase II and model 11 for phase III. Phase I is more tolerant and only requires a value of 10 or above to be moved to the next phase. Phase II requires a value of 20 or above to be along to phase III. Finally phase III requires a value of 30 or above to be a passing lot. To demonstrate this process I wrote an application to simulate the manufacturing process. This simulation environment is shown in the diagram below.

Demo Environment

The demo for this process is shown below. I also talk about the various integration points between Splunk and Eureqa and show a couple of examples.

Regards,

Dave

Nutonian Eureqa

Eureqa API and SQL Server Data Load

RoundTower Technologies offers an analytics solution called FastAnswers powered by Eureqa. It is an amazing piece of software created by the brilliant people at Nutonian. Eureqa can use sample data to automate the creation of predictive models. If you are interested, the Data Scientists at Nutonian and RoundTower can explain the mathematics and science behind the technology. Please visit the RoundTower and Nutonian links above to learn more about the solution.

One of the main goals of Eureqa is to provide data science skills to non-PHD data analysts. This is why Eureqa is a very intriguing technology to me. I have an interest in analytics and machine learning as well as some BI background, but I do not have a deep understanding of statistics or machine learning. Eureqa is great because it helps bridge the skills gap.

The primary user interface of the Eureqa application is web based, very intuitive, and well documented. Nutonian also provides a Python API for programmatic access to Eureqa functionality. I have been experimenting with Eureqa and the Python API so I thought I would share some things I learned. Hopefully it will be helpful to someone else using Eureqa. Also, if you are interested, the current API documentation is located at http://eureqa-api.nutonian.com/en/stable/ . This contains basic information on using the API and a few helpful examples.

A question that always comes up when talking about Eureqa is “how does it connect to database sources”. So one of the first things I decided to learn about Eureqa was how to load data from SQL Server. The goal of this post is to show the basics of using the API as well as getting data from SQL.

To use the Eureqa Python API you will at least need to have Python 2.7 installed. I prefer to use a distribution that includes Python and many common libraries including the popular machine learning libraries. I also like to use a Python IDE for development, so my preferred environment is Anaconda2 and JetBrains PyCharm. There is also a good Python IDE called Spyder included with Anaconda2. If you need more assistance getting started there is a plethora tutorials on Python, Anaconda2, and various Python IDE’s on the web.

Once the development environment is setup the next step is installing and enabling the API. To start using the API your Eureqa account or local installation must be licensed and given access to use the API. If this has been done you will see the API options shown below on the settings page after logging in to Eureqa.

Eureqa_Settings_API
Eureqa_Settings_API

This page provides the ability to download the Python API and an access key, which are the two things required to use the API. The Eureqa API installation is easy, just use pip from your default Python directory using the command shown on the settings page. The next step is the API key, which is also easy, just click the get access key button on the settings page and the following dialog box is shown, which allows us to give the key a logical name.


Eureqa API KeyGen

After assigning a name click generate key. Then a dialog like the one below is shown with the API key.

Eureqa API KeyGen Result

Now we can use this key to interact with the Eureqa API. The following code example will show how to connect to SQL server retrieve data and load into a Eureqa data set. In this example we will be using data from concrete strength testing, which is publicly available from the University of California Irvine (UCI) machine learning department. The first step in our Python code will be to define our connection variables and load the required libraries, which is shown below.

Once that is done we will define two functions; one to retrieve data from SQL Server and write to a temporary .csv file and one to load data from the temporary .csv file into a Eureqa data source.

Both of these functions are fairly straightforward and self-explanatory. Notice the first function uses the pyodbc and csv Python libraries we loaded in the first step. These libraries provide database access and csv text processing functionality.

The next piece of code is the main part of the application. The first step is to connect to Eureqa, which is done using the Eureqa interface. This is the entrance point for all programmatic operations in Eureqa. After we have a connection to Eureqa we then define our SQL query and execute the two functions to retrieve and load the data.

The screenshots below show what we see in the Eureqa interface before we execute the code. A Eureqa installation with no data sets.

Eureqa Data Set

We execute the script and see a very uneventful output message that tells us the script ran successfully.

Eureqa Script Result

Now after refreshing the Eureqa data sets window there is a new data set called Concrete_Strength_Data.

Eureqa Data Set Result

Here is a subset of the imported data, which strangely enough looks like the data returned from the SQL query in our code.

Eureqa Data Subset

Now that we have a data set loaded it can be used to run searches and build models. So if you happen to be interested in a predictive model to estimate concrete strength. Here is the model Eureqa built based on the UCI concrete strength data, which it solved in minutes. Eureqa!!

Eureqa Model Result

I’ll expand on this next time.

Regards,

Dave