dbt hub

dbt test: expect_column_proportion_of_unique_values_to_be_between

USE CASE

Values distribution
Uniqueness

APPLIES TO

Column

This page explains the expect_column_proportion_of_unique_values_to_be_between test from the dbt-expectations package. This test verifies that the proportion of unique values within a specified column falls between a given minimum and maximum value, ensuring the appropriate level of uniqueness within the data. This test is useful for assessing data consistency and uniqueness requirements.

How it Works

The expect_column_proportion_of_unique_values_to_be_between test calculates the proportion of unique values in the chosen column and checks whether this proportion lies between the specified minimum and maximum boundaries.

Steps and Conditions:

  1. Column Selection: Choose the column to evaluate for unique value proportion.
  2. Define Limits: Set the minimum (min_value) and maximum (max_value) values for the acceptable range of the proportion of unique values.
  3. Optional Configurations:
    • Group By: This parameter (group_by) allows the test to be applied to distinct groups within the data, aiding in granular analysis across different segments.
    • Row Condition: Use row_condition to filter which rows are considered in the calculation, based on specified conditions.
    • Strict Comparison: Specify if the bounds should be strictly enforced (strictly: true) or if they should include equality (strictly: false).
  4. Execution: Calculate the unique value proportion in the column, considering any groupings or conditions defined. Compare this proportion to the set range.
  5. Outcome:
    • Pass: If the unique value proportion is within the specified range, the test passes.
    • Fail: If the proportion is outside the specified range, the test fails.

Example Usage: E-commerce Company

In an E-commerce platform, managing the variety of products offered can be crucial for customer satisfaction. Consider a scenario where the product_listing table records details about products, including a category_id column that signifies different product categories.


models:
  - name: product_listing
    columns:
        - name: category_id
          tests:
            - dbt_expectations.expect_column_proportion_of_unique_values_to_be_between:
                min_value: 0.1
                max_value: 0.5
                group_by: ['store_id']
                row_condition: "status = 'active'"

In this example, the test checks that the proportion of unique category_id values per store_id for active listings falls between 10% and 50%. This range ensures that each store offers a healthy variety of product categories, not too limited (less than 10%) and not overly diversified to the point of dilution (>50%). This balance is critical for optimizing inventory and satisfying diverse customer preferences.

The only data observability platform built into your dbt code

  • Get monitors on your production tables out-of-the-box with zero configuration
  • Add tests to your code in bulk with a simple UI
  • Track test results over time
  • Set owners and create meaningful alerts
  • Triage incidents faster using our end-to-end column-level lineage graph