Using DQS: How to Find Blank/Empty Values in Your Data

Using DQS: How to Find Blank/Empty Values in Your Data

Data Quality Services (DQS) enables you to easily find blank or empty values in your data source. Further, you can also use DQS to find other values in your data source that are equivalent to blank/empty values (called null equivalent). For example, “NA”, “Not Available”, or “Missing” can be present in the fields where the actual value is not available.
By default, every domain in a DQS knowledge base has a DQS_NULL value, which is associated with null or empty values.

You cannot delete the DQS_NULL value. By default, the DQS_NULL value is set as Correct in the Domain Values tab, which implies that DQS will not flag the blank/empty values as invalid in your data source. However, the missing values are still displayed in the profiler when you run a knowledge discovery, matching policy, cleansing project, or matching project in DQS. The Completeness field in the Profiler tab shows the percentage value of missing data in your source data:

Finding Blank/Empty Values in your Data Source

For the sake of brevity, let’s use the following sample data of company names and their Websites for this article. In the sample data, 5 out of 25 (20%) company name values are missing, and 8 out of 25 (32%) company Website values are missing:

Let’s see how DQS can easily help us find the blank values.
  1. Create a DQS knowledge base with the following 2 domains: CompanyName and CompanyURL.
  2. In both the domains, set the DQS_NULL value in the Domain Values tab to Invalid:
  3. Publish the knowledge base.
  4. Create a cleansing project and select the knowledge base that you just created to run it against.
  5. Select the sample Excel sheet as the data source, and then map the columns with domains as follows:
  6. Click Next. On the Cleanse page, click Start to run the cleansing process.
  7. After the cleansing process completes, expand the Profiler tab to view the profiling information. You will notice that DQS accurately displays the missing values in the profiler for each domain.

    You will also notice that all these 13 blank values (5 from the CompanyName domain and 8 from CompanyURL domain) are also flagged as invalid records in the left pane of the profiler.

    NOTE: The Invalid Records field in the profiler can also display other values that are flagged as invalid by DQS as a result of other domain rules/conditions. But, in this article, we haven’t specified any other rule that renders a value as invalid, except for changing the default value of DQS_NULL.

  8. Click Next. On the Manage and View Results page, click the Invalid tab for a domain, and then click the DQS_NULL row to see the records containing the empty/blank values in the lower pane.

Go ahead and add the required values in the blank/empty fields!

Finding NULL Equivalents

Sometimes, your source data might contain dummy value to denote a null or empty value. For example, “NA”, “Not Available”, or “Missing” for string values, “000” for numerical values, or 1/1/2000 for date values.

To handle this scenario, you can add all these null-equivalent values in the Domain Values tab of a domain, and set them as Invalid.

See Also

Leave a Comment
  • Please add 4 and 3 and type the answer here:
  • Post
Wiki - Revision Comment List(Revision Comment)
Wikis - Comment List
Sort by: Published Date | Most Recent | Most Useful
Posting comments is temporarily disabled until 10:00am PST on Saturday, December 14th. Thank you for your patience.
  • Richard Mueller edited Revision 1. Comment: Modified title casing, added tags

  • Carsten Siemens edited Revision 5. Comment: Added tag: has comment

Page 1 of 1 (2 items)