This page describes the expect_column_distinct_values_to_be_in_set
test from the dbt-expectations package. This test checks that all distinct values in a selected column are contained within a predefined set. It’s useful for ensuring that column data conforms to expected categorical outcomes without unexpected or erroneous values.
How it Works
The expect_column_distinct_values_to_be_in_set
test verifies that every distinct value in a specified column is part of a given set. If any distinct value is not in this set, the test will fail, signaling discrepancies that require attention.
Steps and Conditions:
- Column Selection: Choose the column to analyze for distinct values.
- Define Value Set: List all acceptable distinct values for that column, using the
value_set
argument. - Optional Configuration:
- Quote Values: Optionally, specify whether values should be quoted with
quote_values
. The default setting istrue
. - Row Condition: Use
row_condition
to include only certain rows based on a condition. This can focus the test on relevant data subsets.
- Quote Values: Optionally, specify whether values should be quoted with
- Evaluation: The test gathers all distinct values from the column, considering any row conditions, and checks if each is included in the specified set.
- Outcome:
- Pass: If all distinct column values are found within the specified set, the test passes.
- Fail: If any distinct value in the column is outside the set, the test fails.
Example Usage: B2B SaaS
In a B2B SaaS context, ensuring consistent data labeling in a product's feature usage can be pivotal. Consider a scenario with a feature_access_log
table that records each time a customer accesses a feature, noted in a column named feature_name
.
In this example, the test ensures that only the features 'dashboard', 'report', 'upload', and 'download' are logged in the feature_name
column for entries since January 1, 2023. This helps the B2B SaaS company maintain precise tracking of feature usage, which is crucial for product management and customer success tasks.