Nutonian Eureqa

Eureqa API and SQL Server Data Load

RoundTower Technologies offers an analytics solution called FastAnswers powered by Eureqa. It is an amazing piece of software created by the brilliant people at Nutonian. Eureqa can use sample data to automate the creation of predictive models. If you are interested, the Data Scientists at Nutonian and RoundTower can explain the mathematics and science behind the technology. Please visit the RoundTower and Nutonian links above to learn more about the solution.

One of the main goals of Eureqa is to provide data science skills to non-PHD data analysts. This is why Eureqa is a very intriguing technology to me. I have an interest in analytics and machine learning as well as some BI background, but I do not have a deep understanding of statistics or machine learning. Eureqa is great because it helps bridge the skills gap.

The primary user interface of the Eureqa application is web based, very intuitive, and well documented. Nutonian also provides a Python API for programmatic access to Eureqa functionality. I have been experimenting with Eureqa and the Python API so I thought I would share some things I learned. Hopefully it will be helpful to someone else using Eureqa. Also, if you are interested, the current API documentation is located at http://eureqa-api.nutonian.com/en/stable/ . This contains basic information on using the API and a few helpful examples.

A question that always comes up when talking about Eureqa is “how does it connect to database sources”. So one of the first things I decided to learn about Eureqa was how to load data from SQL Server. The goal of this post is to show the basics of using the API as well as getting data from SQL.

To use the Eureqa Python API you will at least need to have Python 2.7 installed. I prefer to use a distribution that includes Python and many common libraries including the popular machine learning libraries. I also like to use a Python IDE for development, so my preferred environment is Anaconda2 and JetBrains PyCharm. There is also a good Python IDE called Spyder included with Anaconda2. If you need more assistance getting started there is a plethora tutorials on Python, Anaconda2, and various Python IDE’s on the web.

Once the development environment is setup the next step is installing and enabling the API. To start using the API your Eureqa account or local installation must be licensed and given access to use the API. If this has been done you will see the API options shown below on the settings page after logging in to Eureqa.

Eureqa_Settings_API
Eureqa_Settings_API

This page provides the ability to download the Python API and an access key, which are the two things required to use the API. The Eureqa API installation is easy, just use pip from your default Python directory using the command shown on the settings page. The next step is the API key, which is also easy, just click the get access key button on the settings page and the following dialog box is shown, which allows us to give the key a logical name.


Eureqa API KeyGen

After assigning a name click generate key. Then a dialog like the one below is shown with the API key.

Eureqa API KeyGen Result

Now we can use this key to interact with the Eureqa API. The following code example will show how to connect to SQL server retrieve data and load into a Eureqa data set. In this example we will be using data from concrete strength testing, which is publicly available from the University of California Irvine (UCI) machine learning department. The first step in our Python code will be to define our connection variables and load the required libraries, which is shown below.

Once that is done we will define two functions; one to retrieve data from SQL Server and write to a temporary .csv file and one to load data from the temporary .csv file into a Eureqa data source.

Both of these functions are fairly straightforward and self-explanatory. Notice the first function uses the pyodbc and csv Python libraries we loaded in the first step. These libraries provide database access and csv text processing functionality.

The next piece of code is the main part of the application. The first step is to connect to Eureqa, which is done using the Eureqa interface. This is the entrance point for all programmatic operations in Eureqa. After we have a connection to Eureqa we then define our SQL query and execute the two functions to retrieve and load the data.

The screenshots below show what we see in the Eureqa interface before we execute the code. A Eureqa installation with no data sets.

Eureqa Data Set

We execute the script and see a very uneventful output message that tells us the script ran successfully.

Eureqa Script Result

Now after refreshing the Eureqa data sets window there is a new data set called Concrete_Strength_Data.

Eureqa Data Set Result

Here is a subset of the imported data, which strangely enough looks like the data returned from the SQL query in our code.

Eureqa Data Subset

Now that we have a data set loaded it can be used to run searches and build models. So if you happen to be interested in a predictive model to estimate concrete strength. Here is the model Eureqa built based on the UCI concrete strength data, which it solved in minutes. Eureqa!!

Eureqa Model Result

I’ll expand on this next time.

Regards,

Dave

XtremIO Snapshots SQL Server and PowerShell Part IV

I have continued to work on the MTSXtremIO module and adding functionality for XtremIO 4.0. One interesting feature in 4.0 that provides even more benefit with snapshots is the ability to refresh a snapshot. With SQL server this saves some steps with LUN mapping and mounting volumes. I have done some testing using two methods to control the snapshot. I will talk about both methods and show a scripting example using each method.

The first method uses a PowerShell Cmdlet included in the PowerShell Toolkit in EMC Storage Integrator for Windows. The XtremIO 4.0 features are found in the latest version on the software, version 3.8. To use this method the EMC Storage Integrator for Windows must be installed on the machine to be used for scripting. This could potentially be each of the participating SQL Servers.

  1. Here is an example script refreshing a database using the PowerShell Toolkit included with version 3.8 of EMC Storage Integrator for Windows.

    The Second method is using an open source PowerShell Module I developed to manage XtremIO. It is a PowerShell interface to the XtremIO REST API. It provides a comprehensive PowerShell management interface including the snapshot refresh functionality in 4.0.

  2. Here is an example refreshing a database using the Update-XIOSnapshot Cmdlet found in MTSXtremIO module.

The way both of these scripts work relies on some initial setup of a snapshot copy mounted to a server. This could be the same or different servers. The diagram below shows the test scenario used in the example scripts above.

In either scenario above the copy process happens very quickly. In my tests it only took a few seconds. The only difference in the scripts above is the command to connect to the resources and the command to refresh the snapshot. Either method is simple and really makes database snapshots easy. The scripting is very straightforward and easy to understand.

The main benefit of the ESI Toolkit is it is officially supported by EMC. It does require an install an just provides some core provisioning functionality. The MTSXtremIO PowerShell Module does not require installation and can be used for many other XtremIO management and reporting tasks. It is an open source project located here.

Regards,

Dave

SQL Server, PowerShell, and XtremIO Snapshots Part 3 (VSS Native)

During the writing of this post EMC announced the GA release of XtremIO 4.0 on July 2nd. The new documentation and the native VSS provider are now available on EMC support. This will provide the ability to script application consistent snapshots without using AppSync. Unfortunately we do not have our lab upgraded to 4.0 yet, but that will be coming soon and I will test the script and process soon. In the meantime I will talk about how this will provide another way to do application consistent snapshots for XtremIO. I will show the architecture and a mock script of how I think it will work at this point.

In order to use the XtremIO VSS provider it must be installed on the server where we want to do an application consistent snapshot. The install is downloaded from EMC support, at the time of this post the file name for the latest version is XtremIOVSSProvider-1.0.8.msi. After installation the connection to XtremIO is configured through the control panel using the applet.

The control panel applet is only to configure the connection to the XtremIO XMS.

The VSS Provider installation can be verified by opening a command line as administrator and typing vssadmin list providers. This shows us the XtremIO VSS Provider.

The process to use an XtremIO snapshot for a database copy using the VSS provider is very similar to the process used in part 1 of this blog series. The primary difference is the VSS Provider is called to create the XtremIO snapshot. The following image shows the basic VSS architecture.

The test environment is a SQL Server virtual machine on vSphere. The SnapTest01 volume is on a 50GB RDM on XtremIO and the SnapTest01_QA volume is a snapshot of the SnapTest01 volume.

 

The example script will show the process to refresh the QA volume with a new snapshot copy. The script is almost identical to the part 1 script. The first step is to load a few PowerShell modules, define some constants, and connect to VCenter and XtremIO. This is done by using the PowerCli and a function from my MTSXtremIO module, read about that here. This function uses the XtremIO REST API to create the snapshot. I also use a couple of other modules with some of my common functions and a NTFS Security Module which I did not write. I will put links to those at the end of the post.

The example above loads module dependencies and connects to VCenter and XtremIO. The SQL Management Objects are loaded to provide SQL Server functionality.

The next step is to detach the current QA database copy, remove the virtual hard disk, and remove snapshots.

The example above uses SQL Management Objects to access SQL and detach the database. It the uses the VMware PowerCli to remove the RDM from the virtual machine. Then connects to the XtremIO via REST API and deletes snapshots

Now we are ready to create a new snapshot, add it to the lunmap, add the disk to the vm, and attach the database.

The above example creates a snapshot via the VSS provider and maps the volume to the host using the XtremIO REST API. It also rescans the disks and then adds the RDM to the virtual machine. Then the database is attached using SQL SMO.

Although I have not been able to test this code it as we need to upgrade XtremIO first. I had hoped the VSS provider would work with 3.0 but unfortunately I received the following message when I tried it.

“The registered provider does not support shadow copy of this volume”

Hopefully this example should be pretty close.

Regards,

Dave

MTSXtremIO Module
NTFSSecurity Module
MTSAuthentication Module
 

SQL Server, PowerShell, and XtremIO Snapshots Part 2 (EMC AppSync)

In my last post I talked about using PowerShell to do crash consistent snapshots on XtremIO. Then mount the snapshot and use for a secondary SQL Server database copy.

In this post I will talk about scripting application consistent XtremIO snapshots using EMC AppSync and EMC Storage Integrator for Windows (ESI). I will also talk about how the process and architecture differs from part 1 where we just did crash consistent snapshots. EMC AppSync is a server application that can create and manage snapshots for application and virtualization environments using EMC storage arrays. It uses a Host plug-in which allows the AppSync server to control host operations. ESI is a windows application and set of tools which provides EMC management capabilities from a windows machine. One of the features of ESI is a PowerShell toolkit that allows controlling EMC arrays and EMC AppSync using PowerShell CmdLets.

We’ll use the same basic scenario as the last time, but utilizing EMC AppSync. The test environment is a SQL Server virtual machine on vSphere. The SnapTest01 volume is on a 50GB RDM on XtremIO and the SnapTest01_QA volume is a snapshot of the SnapTest01 volume.

With our previous test the script would generally run on the SQL Server to perform the snapshot operations. With the AppSync application the script does not have a requirement to run locally as the agent performs local operations. Here is a diagram of the architecture.

070115_2119_SQLServerPo2

 

In this scenario the script would run on the ESI host using a windows account that has SQL admin rights on the SQL server. This is required to attach the database. The latest version of appsync as of this writing (2.2) does not have the ability to attach a copy from an XtremIO volume. Here is the statement from the admin guide on page 105.

This version of Appsync does not support automated restore of copies on XtremIO. Learn

to use AppSync and XtremIO to restore SQL Server databases to a point in time with a

semi-manual restore process.”

However, this is easily worked around using PowerShell and SQL SMO to fill in the gap. The end script is still simpler than in part 1 because AppSync does a lot of the tedious work for us and we get application consistent snapshots to boot. The other benefit of the architecture is it allows you to use a central host to control all of your database refresh operations. Here is the first part of the script which loads the ESI module, SQL SMO, and defines some constants.

 

 

The next section detaches the database, initializes the AppSync Objects, gets and dismounts the current snapshot.

 

 

The detach database operation uses SQL Server Management Objects and the AppSync objects are all created using the ESI PowerShell Toolkit CmdLets.

The next section creates a new snapshot mounts it to the host and attaches the database.

 

 

This also uses ESI for the AppSync operations and SQL SMO for the database attach.

An important point to note is we did not have to create steps for all of the individual disk operations and LUN mapping. The AppSync server and host agent took care of the details for us.

That is it for application consistent snapshots with AppSync. Next time I will talk about the method we will be able to use when XtremIO 4.0 gets here and the native VSS Provider is available.

Regards,

Dave

SQL Server, PowerShell, and XtremIO Snapshots Part 1

XtremIO is becoming a popular platform for SQL Server. It performs excellent and has a great space benefit for database copies when using XtremIO snapshots. I have had a few customers asking questions about scripting snapshots for SQL Server on XtremIO using PowerShell.

Most Windows administrators these days are using PowerShell and many SQL DBA’s are also starting to use it. I decided to setup EMC AppSync with XtremIO in our lab, do a little testing, and create some PowerShell scripts to help our customers get started. My plan was to test two different scenarios, one for crash consistent and one for application consistent snapshots with EMC AppSync, but EMC announced a new native VSS provider with XtremIO 4.0 so now there will also be another way to do application consistent snapshots when 4.0 is released. This post will cover creating a crash consistent copy of a SQL database volume using XtremIO snapshots and mounting the snapshot to create a secondary QA database. This method will use the XtremIO REST API to create a snapshot of the source volume. PowerShell will be used to execute the required steps on XtremIO, VMware, and Windows.

The test environment is a SQL Server virtual machine on vSphere. The SnapTest01 volume is on a 50GB RDM on XtremIO and the SnapTest01_QA volume is a snapshot of the SnapTest01 volume.

 

SQL_Server

The example script will show the process to refresh the QA volume with a new snapshot copy. Here is the basic logical flow of the process.

SnapSshot

The first step is to load a few PowerShell modules, define some constants, and connect to VCenter and XtremIO. This is done by using the PowerCli and a function from my MTSXtremIO module, read about that here. This function uses the XtremIO REST API to create the snapshot. I also use a couple of other modules with some of my common functions and a NTFS Security Module which I did not write. I will put links to those at the end of the post.

The example above loads module dependencies and connects to VCenter and XtremIO. The SQL Management Objects are loaded to provide SQL Server functionality.

The next step is to detach the current QA database copy, remove the virtual hard disk, and remove snapshots. We will also do a couple of rescans in the process and remove the old snapshots.

The example above uses SQL Management Objects to access SQL and detach the database. It the uses the VMware PowerCli to remove the RDM from the virtual machine. Then connects to the XtremIO via REST API and deletes snapshots

Now we are ready to create a new snapshot, add it to the lunmap, add the disk to the vm, and attach the database.

The above example creates a snapshot and maps the volume to the host using the XtremIO REST API. It also rescans the disks and then adds the RDM to the virtual machine. Then the database is attached using SQL SMO.

This completes the database refresh. I hope someone finds this helpful. Next post will introduce AppSync into the mix for application consistency and some additional benefits.

Regards,

Dave

MTSXtremIO Module
NTFSSecurity Module
MTSAuthentication Module

EMC Knowledge Sharing Competition – Article Selected for Publishing

In the fall last year I learned about the 2015 EMC Knowledge Sharing Competition. This is a writing competition hosted by EMC to promote sharing of information about technology topics. All EMC certified professionals are welcome to submit abstracts for consideration in the contest. I thought the work I had done with Data Domain Boost (DD Boost) for SQL Server would a good topic and decided to give the contest a shot.

The previous work I have done on this topic can be found at the links below.

Data Domain Boost for SQL – Deployment Planning Considerations for DBAs

SQL Server Data Domain Boost Scripting Toolkit

Toolkit Project on GitHub

In November I was notified my abstract was accepted and it was time to get started writing the article. It was a lot of work especially because I was in the middle of writing and publishing the SQL Server Data Domain Boost Scripting Toolkit. It turned out to be a great experience and last week I was notified my article, How to Boost SQL Server Backups with Data Domain, was selected for publishing. It will be included in the 2015 book of abstracts and published at some point during the year.

Find our more about the EMC Knowledge Sharing Competition here

While I was not selected as a finalist, the experience was enjoyable and interesting. I think this is a very cool event EMC does, I appreciate being part of it, and I’m sure it takes a lot of work internally. I will be looking forward to the awards session at EMC World 2015 to hear the winners announced and congratulate them.

Regards,

Dave

SQL Server Data Domain Boost Scripting Toolkit

Over the past several months I have worked with customers who have been implementing EMC Data Domain Boost for SQL Server. A common challenge for these customers is automating backups and restores using SQL Server tools such as T-SQL, maintenance plans, and agent jobs. I have posted about this previously here but found some organizations still needed additional help getting started. This led me to create the SQL Server Data Domain Boost Scripting Toolkit to provide DBAs with a head start. Currently the toolkit provides T-SQL stored procedures and a table definition to provide additional functionality on top of EMC’s Data Domain Boost for SQL Server.

The toolkit is published on GitHub at this address https://github.com/dmuegge/ddb-sql-toolkit. The toolkit is licensed under the MIT open source license agreement.

With this post I felt a demo video would provide the most value and you can also find documentation on using the toolkit in the scripts and files included with the toolkit. The video is approximately 24 minutes, Enjoy.
 

 

The goal of the SQL Server Data Domain Scripting Toolkit is to provide DBAs with a head start to T-SQL scripting and automation using DD Boost for SQL Server which is part of Data Domain Boost for Microsoft Applications. The initial toolkit utilizes the xp_cmdshell system stored procedure in a set of T-SQL stored procedures to execute the DD Boost for SQL executables. I also hope to extend this toolkit in the future with additional options to utilize SQL DD Boost.

See the previous post I wrote which led me to create the SQL Server Data Domain Boost Scripting Toolkit. I hope someone finds this toolkit useful. Please provide any feedback and updates to the toolkit.

Regards,

Dave

 

Data Domain Boost for SQL – Deployment Planning Considerations for DBA’s

Update: This work has been expanded upon in the following post – DD Boost T-SQL Scripting Toolkit

EMC recently released Data Domain Boost for Microsoft Applications. This tool includes a SQL Management Studio plugin that allows SQL backups via a familiar SQL GUI or a CLI application.

EMC whitepaper – The business value of Data Domain Boost

This tool does a good job of keeping control of backups in the hands of a DBA and taking advantage of the benefits of Data Domain.

While helping a client recently, I had a chance to dig into DD Boost for SQL and do some testing. This post outlines some key things I learned about the architectural aspects of the software. Also some potential operational impact organizations should be aware of when planning an implementation. The amount of impact on your particular organization will vary depending on how SQL backups are managed today. In most cases the backups are currently managed by the SQL DBA’s. This is a common reason organizations choose DD Boost for SQL.

In this scenario there are some important questions to ask.

  1. Are backups managed by T-SQL in agent jobs?
  2. Are backups managed with maintenance plans?
  3. What automation is done in jobs? Particularly automation in restores.
  4. What logging and reporting is done today?
  5. Can TSQL backup routines be modified?
  6. Can xp_cmdshell be enabled on servers?

 

Many or most DBA’s have T-SQL procedures running the backups. These procedures commonly have additional logic for other operations, logging and reporting. The Data Domain boost plugin for SQL server runs as a command line executable. It has a GUI front end to help create the commands or run the process directly.

The backup program is ddbmsqlsv.exe and the restore program is ddbmsqlrc.exe. The DD Boost administrator guide recommends using windows task scheduler to execute theses commands and manage backup and restore jobs. This presents an issue for many DBA’s because they have a much greater set of functionality in the SQL Agent job engine. Additionally, agent jobs and T-SQL procedures have easy access to all relevant data in the SQL environment needed to drive the logic of the DBA’s operational processes.

This hurdle can be overcome, but there is tradeoff and compromise involved. The DD Boost plugin can be run from T-SQL scripts using the xp_cmdshell extended stored procedure. This does allow the DBA’s to keep the backup jobs and processes running in T-SQL and the agent. The trade-off will be the increased attack surface for SQL server. The following command examples show the use of xp_cmdshell with the DD Boost applications.

This method should work for most organizations, which are OK with the trade-offs. One of the trade-offs will likely be some logic change in the SQL agent jobs for backup. One question a DBA posed to me was a way to run the backup from a SQL Agent job and be able to report on success while keeping a log. Here is the quick solution I came up with. This should be a good start for most DBA’s and I am sure many can improve on this greatly.

The first step is to enable xp_cmdshell on the server and there are many places on the internet to show how to do this. Then we need to create a job for the backup.

The job has a single step to run the T-SQL.

The following T-SQL code runs the job.

This code uses an in memory table and a cursor to loop through the output of the DD Boost command. This allows the detection of success strings and if they are not found an error is raised to the job engine. This script also prints the command output, which can get captured for the history logs. I am sure there are many SQL gurus out there who can improve this code. Please post your improvements, they are welcome.

The screen shot below shows the job step advanced properties page. The “include step output in history” option will allow the output to be seen in the job activity history.

The screenshot below shows the job activity monitor log viewer, which has both successful and failed jobs.

The output of the backup command is stored in the message field in the history. This provides a way to go back and find out what happened to the backup.

I believe this method should be sufficient for most organizations except those with strict security requirements and a high level of automation in database restores. The security requirement is due to the use of xp_cmdshell, which some organizations may not want to use. The level of automation in restores could pose an issue because of how specific backup sets are accessed via the DD Boost application.

There is at least one situation I know of that could present issues. If automated database restores of particular point in times are required this may be very difficult to script reliably. With a normal SQL backup the information about that backup is recorded in tables in the MSDB database. When DD Boost for SQL does a backup it does invoke a SQL backup which logs the job information to the MSDB database. Although, the DD Boost plugin GUI appears to retrieve the backup image data from the Data Domain file system.

This causes an issue in scripting a DD Boost restore of a specific image. The command line tool allows for two modes to identify the backup image used for restore. It will restore the last backup by default or if the –t switch is supplied with a timestamp of the backup image it will use that particular image. The problem is the timestamp comes from the Data Domain file system timestamp for the backup image(I think, someone from EMC correct me if I am wrong). There may also be a way to query this, but it would be via SSH and I have not gone down that road.

In many environments DBA’s will use backup set or file naming schemes in their processes. Since the DD Boost app only allows timestamp to identify the backup the lack of timestamp correlation between DD and SQL causes an issue. The command and output below illustrates the issue.

ddbmsqlsv.exe -c SQL2012-01.vlab.local -l full -a “NSR_DFA_SI=TRUE” -a “NSR_DFA_SI_USE_DD=TRUE” -a “NSR_DFA_SI_DD_HOST=dd-01.vlab.local” -a “NSR_DFA_SI_DD_USER=ddboost” -a “NSR_DFA_SI_DEVICE_PATH=/SQL” “MSSQL$INST1:DB1”

Abbreviated Log Output

43708:(pid 3208):Start time: Wed Aug 06 10:55:44 2014
43621:(pid 3208):Computer Name: SQL2012-01 User Name: administrator
….
43709:(pid 3208):Stop time: Wed Aug 06 10:56:05 2014

This job does get logged in the MSDB database like other SQL jobs, but unfortunately the timestamps do not match up to the DD Boost data.

When looking at the DD Boost restore for the above backup. The time shown on this screen does not show the same backup start and end times as the msdb.backupset table. This is an issue because a backup could be found by description in a query, but DD Boost save sets cannot be referenced properly from DBA’s TSQL maintenance scripts as timestamps do not correlate.



ddbmsqlrc.exe -c sql2012-01.vlab.local -f -t “08/06/2014 10:55:45” -S normal -a “NSR_DFA_SI=TRUE” -a “NSR_DFA_SI_USE_DD=TRUE” -a “NSR_DFA_SI_DD_HOST=dd-01.vlab.local” -a “NSR_DFA_SI_DD_USER=ddboost” -a “NSR_DFA_SI_DEVICE_PATH=/SQL” “MSSQL$INST1:DB1”

It is not likely many organizations will have a level of automation in their SQL database restores to be caught by this, but it is one to be aware of if considering a DD Boost for SQL deployment. I am sure features will be added in future releases that will correct or eliminate this issue. In the meantime EMC, could you please provide a way for DBA’s to reference the DD Boost jobs/image files from T-SQL.

Regards,

Dave