When Aviv Netel joined Elementor’s data team, he walked into a familiar scene for many modern data leaders: chaos.
Alerts were everywhere - dozens a day. Some warned of schema changes, others about failing dbt tests, and many just… existed. The noise was so constant that no one responded to it. Not because the team didn’t care. But because they simply couldn’t tell what mattered anymore.
“There were 30 alerts a day. Every day. No one knew what was important, so we stopped reacting altogether.”
This wasn’t just a productivity problem; it was a trust problem. Engineers were burning out. On-call rotations meant 2 AM false alarms. And worse, real issues were going unnoticed.
So Aviv and the team made a bold decision:
They shut the whole thing down and started from scratch.
Rebuilding, Thoughtfully
Rather than patch a broken system, they treated the migration to a new data platform as a chance to rethink everything, starting with a deceptively simple question:
"What’s worth alerting on?"
That one question became the cornerstone of a company-wide rethink. Instead of carrying over old tests and rules, they rebuilt with intention. They asked stakeholders what they actually cared about. They worked backwards from business KPIs. And they made sure every test and alert had a clear purpose and a clear audience.
Business-Aware, Engineer-Friendly
The team began separating alerts into two tracks:
- Technical alerts for engineering teams focused on upstream failures, data ingestion issues, or logic bugs.
- Business alerts for downstream impact tied directly to critical metrics like “active installations” or failed payments.
They also introduced structured tagging for each test, by type (technical vs. business), data layer (bronze, silver, gold), and severity. Only failures in gold-layer models tied to critical data assets could trigger high-priority alerts or page the on-call engineer. And gold tests had to meet a higher standard; no flaky warnings allowed.
“If something in gold triggered a warning all the time, it got removed. We only alert on what really matters.”
With that, alert fatigue didn’t just drop. It disappeared.
Shift Left, Pass Context Right
One of the most transformative changes was how the team approached ownership.
They didn’t try to centralize everything. Instead, they built systems to route alerts directly to the people best equipped to handle them:
- Broken logic? The data engineering team got it.
- Malformed events from upstream systems? Routed directly to the data producers via Slack - with CSVs / dashboards attached, highlighting the problem.
Even analysts and BI users only received alerts when their data was genuinely at risk, not for every minor upstream hiccup.
“Now, when there’s an alert, people jump on it. Not because I told them to - because they know it matters.”
Intentional Testing
Another cultural shift was how the team thought about tests.
With tools like dbt and Cursor, it’s easy to write tests for everything. But that was exactly the problem. Without guardrails, the team was drowning in copy-pasted tests that created noise instead of signal.
So they introduced a new philosophy:
- Every new domain or model starts with hypothesis-driven testing.
- Each test must answer a clear question or validate a business expectation.
- Tests that aren’t useful? Removed.
Guidelines were documented in the team’s README. Test types were mapped to layers. Tags were standardized. Reviewers started asking, “Why does this test exist?”
Each layer had its own testing rules:
- Bronze / Silver: Only technical issues and pipeline errors from the producer or external APIs.
- Gold: Only business validations and business implications. Within gold, tests were split by domain, e.g., anomalies in sales were routed to the checkout and BI teams, while anomalies in active installations went to the growth team.
And slowly, things started to click.
When to Retire a Test
Elementor didn’t just think carefully about which tests to write; they also developed clear thinking around when to remove them.
Every test started as a hypothesis. But once that hypothesis was validated or proven irrelevant, they weren’t afraid to let it go.
Here are two examples Aviv shared:
- From uncertainty to trust: Early on, the team wasn’t sure if a specific event stream included a unique key. So they added a test to enforce it. Over time, as collaboration with the data producer improved and trust grew, they removed the test and relied on the business key in the gold layer instead. The hypothesis had been confirmed; they no longer needed the safety net.
- The warning that never goes away: Some tests failed constantly with low-priority warnings. After repeated investigation, the team realized two things: either the issue was expected due to the nature of the data, or it was out of their control. In both cases, the test proved to be useless. They either downgraded it, redirected it to the right owner, or removed it entirely.
These weren’t exceptions; they became part of the team’s testing philosophy:
Every test has a lifecycle. If no one acts on the result, it’s time to reconsider it.
SLAs That Let You Sleep
With a clean, trusted set of alerts in place, the team could finally introduce real SLAs -and actually meet them.
They created tiered response times for different issues. Critical issues in gold? Resolved fast. Less important ones? Logged and handled during the day. And only truly urgent incidents triggered the on-call engineer.
“We used to get alerts at 2AM that didn’t matter. Now, when an alert fires, people jump in - not just the on-call, but the whole team. Because it’s rare. And it’s real.”
Weeks would go by without a single alert. Not because the system was silent - but because it was healthy.
And that meant something even more important: the team finally had time to work on what mattered.
Where Elementary Fit In
Elementary wasn’t just a tool - they made it part of the process:
- They used custom alert routing to separate signals by layer and audience.
- Downstream impact awareness: Alerts included visibility into which critical assets and downstream models were affected - so the team always knew who and what was impacted.
- They leveraged volume and freshness anomaly detection to monitor business KPIs with high confidence.
- They integrated with Opsgenie so that only critical alerts triggered pages.
- In some cases, they even used the UI-based configuration to create fast, scoped tests without needing to dive into code.
- MCP Server as a next step: The team plans to use Elementary’s MCP Server to create and manage tests directly from the IDE, allowing engineers to understand coverage gaps and define expectations earlier, in context, as part of their normal development workflow.
“Elementary helped us route alerts to the right places, with the right context, and avoid polluting our main alert channels.”
Advice for Any Team Drowning in Alerts
Aviv’s advice is simple and hard-earned:
- Start by cleaning up. If no one acted on an alert for 3 days, downgrade or remove it.
- Think in layers. Gold is sacred. Bronze is noisy. Treat them differently.
- Avoid test inflation. Don’t test everything. Test what matters.
- Collaborate with your stakeholders. Don’t guess what the business wants - ask.
- Be intentional. Every alert should have a purpose, a recipient, and a reason to exist.
“The biggest change wasn’t technical. It was cultural. We built a system we trust - so now we trust the alerts.”
When Aviv Netel joined Elementor’s data team, he walked into a familiar scene for many modern data leaders: chaos.
Alerts were everywhere - dozens a day. Some warned of schema changes, others about failing dbt tests, and many just… existed. The noise was so constant that no one responded to it. Not because the team didn’t care. But because they simply couldn’t tell what mattered anymore.
“There were 30 alerts a day. Every day. No one knew what was important, so we stopped reacting altogether.”
This wasn’t just a productivity problem; it was a trust problem. Engineers were burning out. On-call rotations meant 2 AM false alarms. And worse, real issues were going unnoticed.
So Aviv and the team made a bold decision:
They shut the whole thing down and started from scratch.
Rebuilding, Thoughtfully
Rather than patch a broken system, they treated the migration to a new data platform as a chance to rethink everything, starting with a deceptively simple question:
"What’s worth alerting on?"
That one question became the cornerstone of a company-wide rethink. Instead of carrying over old tests and rules, they rebuilt with intention. They asked stakeholders what they actually cared about. They worked backwards from business KPIs. And they made sure every test and alert had a clear purpose and a clear audience.
Business-Aware, Engineer-Friendly
The team began separating alerts into two tracks:
- Technical alerts for engineering teams focused on upstream failures, data ingestion issues, or logic bugs.
- Business alerts for downstream impact tied directly to critical metrics like “active installations” or failed payments.
They also introduced structured tagging for each test, by type (technical vs. business), data layer (bronze, silver, gold), and severity. Only failures in gold-layer models tied to critical data assets could trigger high-priority alerts or page the on-call engineer. And gold tests had to meet a higher standard; no flaky warnings allowed.
“If something in gold triggered a warning all the time, it got removed. We only alert on what really matters.”
With that, alert fatigue didn’t just drop. It disappeared.
Shift Left, Pass Context Right
One of the most transformative changes was how the team approached ownership.
They didn’t try to centralize everything. Instead, they built systems to route alerts directly to the people best equipped to handle them:
- Broken logic? The data engineering team got it.
- Malformed events from upstream systems? Routed directly to the data producers via Slack - with CSVs / dashboards attached, highlighting the problem.
Even analysts and BI users only received alerts when their data was genuinely at risk, not for every minor upstream hiccup.
“Now, when there’s an alert, people jump on it. Not because I told them to - because they know it matters.”
Intentional Testing
Another cultural shift was how the team thought about tests.
With tools like dbt and Cursor, it’s easy to write tests for everything. But that was exactly the problem. Without guardrails, the team was drowning in copy-pasted tests that created noise instead of signal.
So they introduced a new philosophy:
- Every new domain or model starts with hypothesis-driven testing.
- Each test must answer a clear question or validate a business expectation.
- Tests that aren’t useful? Removed.
Guidelines were documented in the team’s README. Test types were mapped to layers. Tags were standardized. Reviewers started asking, “Why does this test exist?”
Each layer had its own testing rules:
- Bronze / Silver: Only technical issues and pipeline errors from the producer or external APIs.
- Gold: Only business validations and business implications. Within gold, tests were split by domain, e.g., anomalies in sales were routed to the checkout and BI teams, while anomalies in active installations went to the growth team.
And slowly, things started to click.
When to Retire a Test
Elementor didn’t just think carefully about which tests to write; they also developed clear thinking around when to remove them.
Every test started as a hypothesis. But once that hypothesis was validated or proven irrelevant, they weren’t afraid to let it go.
Here are two examples Aviv shared:
- From uncertainty to trust: Early on, the team wasn’t sure if a specific event stream included a unique key. So they added a test to enforce it. Over time, as collaboration with the data producer improved and trust grew, they removed the test and relied on the business key in the gold layer instead. The hypothesis had been confirmed; they no longer needed the safety net.
- The warning that never goes away: Some tests failed constantly with low-priority warnings. After repeated investigation, the team realized two things: either the issue was expected due to the nature of the data, or it was out of their control. In both cases, the test proved to be useless. They either downgraded it, redirected it to the right owner, or removed it entirely.
These weren’t exceptions; they became part of the team’s testing philosophy:
Every test has a lifecycle. If no one acts on the result, it’s time to reconsider it.
SLAs That Let You Sleep
With a clean, trusted set of alerts in place, the team could finally introduce real SLAs -and actually meet them.
They created tiered response times for different issues. Critical issues in gold? Resolved fast. Less important ones? Logged and handled during the day. And only truly urgent incidents triggered the on-call engineer.
“We used to get alerts at 2AM that didn’t matter. Now, when an alert fires, people jump in - not just the on-call, but the whole team. Because it’s rare. And it’s real.”
Weeks would go by without a single alert. Not because the system was silent - but because it was healthy.
And that meant something even more important: the team finally had time to work on what mattered.
Where Elementary Fit In
Elementary wasn’t just a tool - they made it part of the process:
- They used custom alert routing to separate signals by layer and audience.
- Downstream impact awareness: Alerts included visibility into which critical assets and downstream models were affected - so the team always knew who and what was impacted.
- They leveraged volume and freshness anomaly detection to monitor business KPIs with high confidence.
- They integrated with Opsgenie so that only critical alerts triggered pages.
- In some cases, they even used the UI-based configuration to create fast, scoped tests without needing to dive into code.
- MCP Server as a next step: The team plans to use Elementary’s MCP Server to create and manage tests directly from the IDE, allowing engineers to understand coverage gaps and define expectations earlier, in context, as part of their normal development workflow.
“Elementary helped us route alerts to the right places, with the right context, and avoid polluting our main alert channels.”
Advice for Any Team Drowning in Alerts
Aviv’s advice is simple and hard-earned:
- Start by cleaning up. If no one acted on an alert for 3 days, downgrade or remove it.
- Think in layers. Gold is sacred. Bronze is noisy. Treat them differently.
- Avoid test inflation. Don’t test everything. Test what matters.
- Collaborate with your stakeholders. Don’t guess what the business wants - ask.
- Be intentional. Every alert should have a purpose, a recipient, and a reason to exist.
“The biggest change wasn’t technical. It was cultural. We built a system we trust - so now we trust the alerts.”