This page describes the expect_column_distinct_count_to_equal_other_table
test from the dbt-expectations package. This test verifies that the number of distinct values in a column of one model matches the number of distinct values in a column of another model. It is particularly useful for ensuring consistency across related datasets.
How it Works
The test compares the count of distinct values in one specified column against the count of distinct values in another column from a different model. It is designed to assure that the relationships between datasets are maintained and consistent.
Steps and Conditions:
- Column and Model Selection: Choose the column and the model you want to test. Also, determine the target column and model against which the distinct count will be compared.
- Configure Comparison: Define the comparison model and the column within that model. You can also specify conditions to filter which rows should be considered for both the source and comparison columns.
- Row and Compare Row Conditions _(Optional)_: Apply any specified row conditions to filter the datasets before performing the count.
- Execution: The test counts the distinct values in both the source and comparison columns, taking into account any row conditions.
- Outcome:
- Pass: The test passes if the number of distinct values in both columns is equal.
- Fail: The test fails if the number of distinct values differs between the two columns.
Example Usage: Fintech Company
In a Fintech company, ensuring that transaction types recorded in separate systems or tables are consistent can be critical for accurate reporting and compliance.
Consider a scenario where a Fintech company maintains two models: transactions
and audited_transactions
. Both models have a column named transaction_type
. The transactions
table contains raw transaction data, and audited_transactions
contains verified transaction records for compliance purposes.
This setup ensures that the number of distinct transaction types in the transactions
table matches the number of distinct transaction types in the audited_transactions
table. This alignment is crucial for validating the integrity and consistency of transaction data across different data recordings in the Fintech operational workflows.