Demo Transcript: A First Look at SQL Server 2012 (Part 2 - Data Quality Services)

Demo Transcript: A First Look at SQL Server 2012 (Part 2 - Data Quality Services)

SQL11UPD00-TSCRIPT-04

This wiki is a transcript of a previously recorded video.

Related content assets:


Improved Data Accuracy with SQL Server 2012 Data Quality Services

View Previous Demo Transcript: A First Look At SQL Server 2012 (Part 1 - AlwaysOn) (SQL11UPD00-TSCRIPT-03)

Hi welcome back, this is Roger Doherty and this is part two of a four part demo on some of the great new capabilities in SQL Server 2012.  In this part of the demo we are going to focus on an interesting new capability that we're introducing with SQL Server 2012 called Data Quality Services. With Data Quality Services you can empower the business users in your organization who are really the domain experts of your data with a really productive toolset that makes it possible for them to ensure that the data in your organization is accurate and compliant.  So let's jump right in!

So we left off here in our Reporting Services report.  We've just shown how we can really easily route reporting requests to readable secondary replicas using AlwaysOn. We have a second report here in Reporting Services, this is a nice little report that allows us to take a look at the various resellers that we have in our system.  It shows the reseller address and then we use Bing maps and SQL Server spatial types to get a nice visualization of where that reseller is.  But if we scroll down here you notice there is some weirdness going on. We've got this reseller here in Texas whose map isn't coming up correctly.  What we notice here is that their address is actually incorrect.  We have an incorrect state code here, there's something wrong with the address.  Now chances are that as a web developer I'm not the guru of scrubbing addresses.  I probably have somebody in the Finance department who is a lot more capable of doing that than I am.  So rather than chasing this down myself, what I'm going to do is export this data into a spreadsheet so that we can hand it off to the person in the finance department who is the domain expert for our reseller data.  What we've done here is we are just going to highlight some of the problems with this data.  We've got some inconsistent state codes, we've got some missing zip codes, we've got some improper street addresses.  This is really an extensive manual task.  It would be really nice if there was an automated way of doing this. 

What we are going to show you is how we are going to use Data Quality Services and the Windows Azure Data Market to make this process really easy as a business user.  So I'm going to pop open the Data Quality Services client and the first that we'll take a look at is this address validation domain.  When we pop this open this is essentially a definition of what we consider an address, such as an address line, city, state and zip.  And then we bundle those together into something that's called a composite domain that includes all of those things. That allows us to work with several different attributes at the same time and scrub them up.

To do this scrubbing we will leverage a service called Melissa Data.  Melissa Data is a company that has basically developed an address scrubbing capability that they make available as a cloud service.  We make this service discoverable and usable through the Windows Azure Data Market.  So all you really need is an Azure subscription, a Data Market subscription, you sign up to use Melissa Data, and you can use their service to scrub your data.  So let's see how easy it is to actually go and do this.

We are going to create a new Data Quality Project here, and we'll call it "scrub addresses".  We're going to choose that address validation domain that we just looked at, this will be a Data Quality Project, and we'll proceed here through this wizard.  So the next thing we are going to chose here is the data source that includes the addresses that we need to get scrubbed.  I can get that out of SQL, but let's just use this Excel spreadsheet that we handed off to our finance user that has the address data in it that needs scrubbed. To do that all we need to do is just map the columns in our spreadsheet to the composite domain that we are using to scrub the data up in Melissa Data.  So I just do a quick column-to-column mapping here in the tool.  So we've mapped all the columns now and when I click on next what you'll see is we're ready to start scrubbing.

When I kick off start here it's going to send all this data up to Melissa Data, it's going to scrub these addresses, and come back with some nice results for us to check out.  So as it's going through the data scrubbing process we get some preliminary results here on the cleanliness of our data and it's already finished.  We sent up 42 addresses, 69% of our data was correct, 24% of our data needed corrections from Melissa Data and those were automatically corrected because it was pretty obvious what needed to be done.  And then there are 7% of our data that has some suggested changes that are going to require human intervention in order to fix.  You don't want to incorrectly fix something.  Some times you have to put some human eyeballs on it to make sure.  So here are those three records that require some human intervention.  It looks like this one is a suggested zip code.  This one has a new street address that needs to be fixed, and this one also has a new street address that needs to be fixed.  I'm just going to go ahead and approve all of these changes, we'll assume Melissa Data did a good job there. 

Next, we've finished scrubbing our data and we get some output.  The output that comes back from the service includes the input data, the output data, and then comments in terms of what was actually changed.  Now we can push this data back into SQL Server so that our DBA can pick it up and re-introduce it back into the system.  So we'll go ahead and dump this into a table in SQL Server, we'll call it "scrubbed addresses", and we'll include both the data and the cleansing info which is additional descriptive information about the kinds of data cleansing operations that occurred.  We'll click on finish and it pushes it back into the system.

Now all we have to do as a DBA is re-incorporate that data back into our system.  So if we go back into our database here we've got a stored procedure that's going to make it easy for us to do that, so all I do is run this MergeCleanAddress stored procedure.  It's going to take that data from Melissa Data, push it back into our address table, and we've updated 11 addresses here.  Pretty cool.  So end-to-end we've scrubbed it.

So let's go see how our report looks now that we've actually gone and scrubbed this data.  So if I pull my report back up and refresh this report, what the hope is is that we've got these corrected addresses now pointing to real locations out there so our maps will no longer be messed up.  And there it is.  Our addresses are all fixed, we've got good visualizations for all of our reseller addresses coming from Bing maps.

So we've showed you how Data Quality Services makes it possible to empower the business users who are the domain experts of your data in your organization to be really productive making sure that your data is accurate and compliant.  We've showed you how to use cloud services to assist in that data quality process through the Windows Azure Data Market and Melissa Data, and we've even shown how we can use web services like Bing maps to do data visualization within your on-premise SQL Reporting Services reports, showing you how you can start leveraging cloud services now to improve your on-premise environment.  Thanks a lot.

View Next Demo Transcript: A First Look at SQL Server 2012 (Part 3 - SQL Server Data Tools) (SQL11UPD00-TSCRIPT-05)


Return to SQL Server 2012 Developer Training Kit BOM (en-US)

Leave a Comment
  • Please add 7 and 5 and type the answer here:
  • Post
Wiki - Revision Comment List(Revision Comment)
Sort by: Published Date | Most Recent | Most Useful
Comments
  • Richard Mueller edited Revision 5. Comment: Removed (en-US) from title, added tag

Page 1 of 1 (1 items)
Wikis - Comment List
Sort by: Published Date | Most Recent | Most Useful
Posting comments is temporarily disabled until 10:00am PST on Saturday, December 14th. Thank you for your patience.
Comments
  • Richard Mueller edited Revision 5. Comment: Removed (en-US) from title, added tag

Page 1 of 1 (1 items)