This page outlines the not_accepted_values test from the dbt-utils package. This test is designed to ensure that specific, defined values do not appear in a column within your dataset. It's particularly useful for maintaining data integrity by preventing the inclusion of known incorrect, irrelevant, or otherwise unacceptable values.
How it Works
The not_accepted_values test checks a specified column for the presence of any values from a defined list of unacceptable values. If any of these values are found, the test fails, indicating that the dataset contains data that violates the predefined constraints.
Steps and Conditions:
- Column Selection: Identify the column you wish to test for unacceptable values.
- Define Unacceptable Values: Specify a list of values that should not appear in the selected column.
- Execution: The test scans the column for any occurrences of the defined unacceptable values.
- Outcome:
- Pass: If none of the unacceptable values are found in the column, the test passes, indicating that the column complies with the defined data quality standards.
- Fail: If any of the unacceptable values are present in the column, the test fails. This failure signals a need to investigate and rectify the source of these values.
Example Usage: E-commerce Company
For an E-commerce company, ensuring accurate and consistent geographical data is crucial for logistics, shipping calculations, and regional compliance. The not_accepted_values test can be applied to the addresses table to validate the city column, ensuring it does not contain known incorrect city names that could disrupt shipping processes.
Consider a scenario where the addresses table stores user shipping information, including a city column that should not include fictional or deprecated city names.
In this example, the not_accepted_values test ensures that the city column in the addresses table does not contain the values 'Gotham' or 'Metropolis', which are known to be fictional and therefore unacceptable for real shipping addresses.