RoundTower Technologies offers an analytics solution called FastAnswers powered by Eureqa. It is an amazing piece of software created by the brilliant people at Nutonian. Eureqa can use sample data to automate the creation of predictive models. If you are interested, the Data Scientists at Nutonian and RoundTower can explain the mathematics and science behind the technology. Please visit the RoundTower and Nutonian links above to learn more about the solution.
One of the main goals of Eureqa is to provide data science skills to non-PHD data analysts. This is why Eureqa is a very intriguing technology to me. I have an interest in analytics and machine learning as well as some BI background, but I do not have a deep understanding of statistics or machine learning. Eureqa is great because it helps bridge the skills gap.
The primary user interface of the Eureqa application is web based, very intuitive, and well documented. Nutonian also provides a Python API for programmatic access to Eureqa functionality. I have been experimenting with Eureqa and the Python API so I thought I would share some things I learned. Hopefully it will be helpful to someone else using Eureqa. Also, if you are interested, the current API documentation is located at http://eureqa-api.nutonian.com/en/stable/ . This contains basic information on using the API and a few helpful examples.
A question that always comes up when talking about Eureqa is “how does it connect to database sources”. So one of the first things I decided to learn about Eureqa was how to load data from SQL Server. The goal of this post is to show the basics of using the API as well as getting data from SQL.
To use the Eureqa Python API you will at least need to have Python 2.7 installed. I prefer to use a distribution that includes Python and many common libraries including the popular machine learning libraries. I also like to use a Python IDE for development, so my preferred environment is Anaconda2 and JetBrains PyCharm. There is also a good Python IDE called Spyder included with Anaconda2. If you need more assistance getting started there is a plethora tutorials on Python, Anaconda2, and various Python IDE’s on the web.
Once the development environment is setup the next step is installing and enabling the API. To start using the API your Eureqa account or local installation must be licensed and given access to use the API. If this has been done you will see the API options shown below on the settings page after logging in to Eureqa.
This page provides the ability to download the Python API and an access key, which are the two things required to use the API. The Eureqa API installation is easy, just use pip from your default Python directory using the command shown on the settings page. The next step is the API key, which is also easy, just click the get access key button on the settings page and the following dialog box is shown, which allows us to give the key a logical name.
After assigning a name click generate key. Then a dialog like the one below is shown with the API key.
Now we can use this key to interact with the Eureqa API. The following code example will show how to connect to SQL server retrieve data and load into a Eureqa data set. In this example we will be using data from concrete strength testing, which is publicly available from the University of California Irvine (UCI) machine learning department. The first step in our Python code will be to define our connection variables and load the required libraries, which is shown below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# connection and configuration values api_key = 'YOUR_API_KEY' user_name = 'user@domain.com' eureqa_url = 'http://localhost:10202' sql_server = 'localhost' sql_database = 'Concrete_DB' sql_user = 'cdb_user' sql_password = 'cdb_user_pwd' data_file = 'C:\\temp\\temp.csv' data_source_name = 'Concrete_Strength_Data' # library imports from eureqa import * import pyodbc, csv |
Once that is done we will define two functions; one to retrieve data from SQL Server and write to a temporary .csv file and one to load data from the temporary .csv file into a Eureqa data source.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# FUNCTION - Get data from SQL Server and save csv file def extract_sql_data(sql_server,sql_db,user,pwd,sql_query,out_file): cnxn = pyodbc.connect("DRIVER={SQL Server Native Client 11.0};SERVER=" + sql_server + ";DATABASE=" + sql_db + ";UID=" + user + ";PWD=" + pwd) cursor = cnxn.cursor() cursor.execute(sql_query) columns = [column[0] for column in cursor.description] result = cursor.fetchall() fso = open(out_file,"wb") ofile = csv.writer(fso) ofile.writerow(columns) ofile.writerows(result) fso.close() # FUNCTION - Load csv file as new data source in Eureqa def create_new_data_source(eureqa, ds_name, input_file): data_source = eureqa.create_data_source(ds_name, input_file) return data_source; |
Both of these functions are fairly straightforward and self-explanatory. Notice the first function uses the pyodbc and csv Python libraries we loaded in the first step. These libraries provide database access and csv text processing functionality.
The next piece of code is the main part of the application. The first step is to connect to Eureqa, which is done using the Eureqa interface. This is the entrance point for all programmatic operations in Eureqa. After we have a connection to Eureqa we then define our SQL query and execute the two functions to retrieve and load the data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# -- MAIN ------------------------------------------------------------------ eureqa = Eureqa(url=eureqa_url, user_name=user_name, key=api_key, organization="local") sql_query = "SELECT [Cement]" \ ", [BlastFurnaceSlag]" \ ", [FlyAsh]" \ ", [Water]" \ ", [Superplasticizer]" \ ", [CoarseAggregate]" \ ", [FineAggregate]" \ ", [Age]" \ ", [ConcreteStrength]" \ " FROM[" + sql_database + "].[dbo].[Concrete_Strength_Data]" extract_sql_data(sql_server,sql_database,sql_user,sql_password,sql_query,data_file) data_source = create_new_data_source(eureqa, data_source_name, data_file) |
The screenshots below show what we see in the Eureqa interface before we execute the code. A Eureqa installation with no data sets.
We execute the script and see a very uneventful output message that tells us the script ran successfully.
Now after refreshing the Eureqa data sets window there is a new data set called Concrete_Strength_Data.
Here is a subset of the imported data, which strangely enough looks like the data returned from the SQL query in our code.
Now that we have a data set loaded it can be used to run searches and build models. So if you happen to be interested in a predictive model to estimate concrete strength. Here is the model Eureqa built based on the UCI concrete strength data, which it solved in minutes. Eureqa!!
I’ll expand on this next time.
Regards,
Dave