This page describes the expect_column_values_to_not_match_regex_list
test from the dbt-expectations package. The test verifies that strings in a specified column do not match any patterns from a list of regular expressions. This is particularly useful when you need to enforce certain formatting rules or prevent specific patterns in string data.
How it Works
The test checks each entry in a selected column against a list of regular expressions, ensuring that none of the entries match any of the provided patterns. This validation is crucial to avoid unwanted or malformed data entering your system.
Steps and Conditions:
- Column Selection: Choose the column whose values you want to validate against the regex list.
- Regex List Configuration: Provide a list of regular expressions that should not match any part of the column's entries.
- Optional Configuration:
- is_raw: Set to
True
if the provided regex patterns are raw strings that need escaping. - flags: Specify any regex flags like
i
for case-insensitive matching. - match_on: Define whether to apply 'OR' (default) or 'AND' logic across the regex patterns.
- Row Condition: Apply conditions to select specific rows for testing.
- is_raw: Set to
- Execution: The test scans each value in the column and checks it against the regex conditions using the specified logic.
- Outcome:
- Pass: If no values in the column match any regex patterns from the list, the test passes.
- Fail: If any value matches a regex pattern, the test fails.
Example Usage: E-commerce
In an E-commerce platform, ensuring that product descriptions are free from unwanted characters or patterns is crucial for maintaining content quality.
Consider a scenario where the product_descriptions
table has a description_text
column that should not contain reference codes or any HTML-like tags.
In this example, the test ensures that no product description includes patterns like <tag>
or [[link]]
, which are typical in HTML or markup languages. By preventing these patterns, the E-commerce site maintains a clear and professionally formatted product catalogue.