NETCONOMY
Visualizing Quality
Description
Armin Hutzler von NETCONOMY zeigt in seinem devjobs.at TechTalk die Best Practices im End-to-End Testing und mit welchen Hintergedanken die Testergebnisse visualisiert werden.
By playing the video, you agree to data transfer to YouTube and acknowledge the privacy policy.
Video Summary
In Visualizing Quality, Armin Hutzler shares practical best practices for stabilizing end-to-end testing: CI integration, consistent test data, resilient data-attribute selectors, auto-waiting and network synchronization, avoiding external UIs, using APIs for login, rerunning failures, and recognizing when the application—not the test—is at fault. He then tackles visibility and comparison pain points by building a results pipeline from CI to Google Cloud Storage, a Go Cloud Function that parses XML into BigQuery, and Grafana dashboards plus Slack notifications to spot flaky tests, compare failures over time, and speed debugging. Viewers can apply these patterns to raise release confidence, detect bugs early, and make e2e quality a team habit.
Visualizing Quality: Making End-to-End Testing Visible — Lessons from “Visualizing Quality” by Armin Hutzler (NETCONOMY)
Why visibility is the real force multiplier for quality
Stable end-to-end (E2E) tests are powerful. Visible E2E tests are transformative. In “Visualizing Quality,” Armin Hutzler (NETCONOMY) walks through both dimensions. After months of hands-on work with E2E testing, he distilled best practices and built a streamlined, effective reporting pipeline with dashboards that make results actionable. From our DevJobs.at editorial vantage point, the core message is clear: if E2E tests simulate real users, your team needs real, immediate answers about their status.
“End-to-end tests simulate real user behavior.”
That idea anchors the talk. When tests mimic a user’s journey, teams need to see at a glance whether those journeys still work: Are today’s runs green? What failed, and since when? Which tests are flaky? Hiding answers inside CI logs wastes a vital opportunity to align engineering and QA around facts, not guesswork.
E2E in context: from the test pyramid to a checkout flow
Armin starts with fundamentals and revisits the test pyramid:
- Unit tests verify components in isolation.
- Integration tests verify interactions between components.
- End-to-end tests cover the entire application with external services in play, clicking and navigating the way a user would.
In an e-commerce setting, the showcase scenario is checkout: from adding a product to the cart to completing the order. E2E asserts the whole chain still works as intended.
As for tooling, Armin names Selenium, Cypress, and Playwright. The specific choice matters less than the orientation: E2E models real workflows and should act as an early-warning system.
Why stable E2E matters
Armin enumerates the payoffs of a reliable E2E suite:
- Reliable codebase: consistency of runs translates into confidence.
- Early bug detection: a daily run reveals what yesterday’s merge changed.
- Courage to refactor: developers can make bigger moves and rely on the pipeline to confirm safety.
- Release confidence: QA leans on E2E for regression coverage.
- Cost savings: finding issues early reduces QA time and prevents late, expensive fixes.
These are practical, not theoretical. Teams that aren’t derailed by red pipelines at 9 a.m. build features faster and sleep better.
The hard parts: why E2E stays demanding
Armin doesn’t sugarcoat the trade-offs:
- Authoring effort: E2E tests often take longer to write than unit tests.
- Flakiness: a test fails today and passes tomorrow — noise that erodes trust.
- Maintenance: UI and workflows evolve; tests must keep pace.
- Runtime: while the unit suite takes around five minutes, E2E runs take more than 20 minutes.
- Debugging complexity: many moving pieces make root-cause analysis tricky.
The implication is straightforward: E2E is valuable and non-trivial. It requires discipline in both design and operations.
Best practices: pragmatic and precise
Armin shares a compact set of tactics that, in combination, dramatically improve outcomes.
1) Integrate with CI/CD
Run tests automatically in a controlled environment. This ensures consistency, discoverability, and timely signals. “It works locally” isn’t a quality metric.
2) Keep frameworks and browsers current
Upgrades bring fixes, features, and ensure compatibility with the browsers your users actually run. Staying current is risk management.
3) Maintain consistent test data
Reliable conclusions require stable inputs. Armin points to product data in tests: if it changes between runs, so does the meaning of the results. Consistency enables meaningful trend analysis.
4) Use resilient selectors: data attributes over CSS classes
UI changes; class or ID names drift. Armin recommends dedicated attributes like data test IDs. They make selectors robust and create a visible cue for developers: change here means “check the E2E suite.”
5) Prefer auto-waiting to explicit waits
Explicit delays (e.g., “wait two seconds”) are imprecise and waste time when elements are ready earlier. Auto-waiting based on state or interactivity — with a maximum timeout — accelerates runs without compromising stability.
6) Synchronize with network requests
Even better, tie progress to actual data flow. Example: a product detail view depends on a backend request. The test waits for that request to resolve before proceeding. This reduces timing issues and improves reliability.
7) Only test UIs you control
External UIs change without notice, may throttle automation, or break selectors. If you must rely on an external service (e.g., login), use its API. This is faster and less brittle.
8) Re-run failing tests once
A single retry filters out transient hiccups. Still, don’t mask real flakiness. Use the retry to stabilize the signal, then fix the underlying cause.
9) Don’t always blame the test
Armin highlights UI layout shifts as a culprit: a pop-in obscures a button, and both users and tests mis-click. That’s not a test defect — it’s a product defect.
Sometimes the right fix is in the application itself, not the suite.
E2E failures often surface UX and stability issues — a feature, not a flaw.
After the setup: the pain points of visibility and comparison
Even with good practice, structural challenges remain — and that’s where Armin’s approach stands out.
- Limited visibility: runs happen “somewhere in the pipeline.” Even QA might not know where or how to retrieve results.
- Comparison is hard: to see flakiness, you’d inspect multiple runs one by one, download archives, parse XMLs, and try to spot patterns.
- “Since when has it been failing?”: clicking back through pipelines to find the last green run is tedious.
The remedy: send test outcomes where the team already is — and visualize trends so outliers are obvious.
Step 1: Slack notifications as a daily cadence
After each test run, the CI/CD pipeline posts to the team’s Slack channel. Benefits include:
- Quick status: “All green or do we need to jump in?”
- One-click access: a link takes you straight to the pipeline.
- A daily reminder: the ritual builds discipline; if the message shows up every day, checking becomes a habit.
This small piece creates awareness without manual overhead.
Step 2: A dashboard that makes quality visible over time
The centerpiece is a dashboard that lets you compare runs over time and drill into specifics. The architecture is intentionally lean and uses widely available components:
- The CI pipeline writes test results (XMLs) to a Google Cloud Storage bucket.
- A Google Cloud Function written in Go is triggered by the bucket event, parses the XMLs, and uploads data to a BigQuery data warehouse.
- Grafana queries BigQuery and renders the visuals.
Why a data warehouse over a database? Armin emphasizes read performance. Warehouses are built for analytics and aggregate queries across large data sets — exactly what a test history needs.
Three perspectives: from run summary to per-test detail
The dashboard offers three levels of granularity, each focused on quick insight and targeted drilldown.
1) Run overview:
- Top: a visualization of the last seven runs with counts of succeeded and failing tests.
- A table version of the same, linked to the pipeline, with the option to drill into individual runs.
- Bottom: run duration trends — a basis for runtime optimization.
2) Individual run details:
- Totals: how many tests executed, how many succeeded, how many failed.
- A list of failing tests — a launchpad for root-cause analysis.
3) Per-test result details:
- A time series: how often this test failed in the last week.
- Flakiness in focus: “usually green with occasional red” stands out.
- A table of failure reasons: direct input for debugging.
Taken together, these views move you from “what happened” to “why” — fast.
What this improves in practice
Armin sums it up; we saw it play out clearly in the flow:
- Daily reminders via Slack.
- One-click access to the dashboard directly from Slack.
- Visual comparison across runs: “Since when has this been failing?”, “How are runtimes trending?”, “Which tests are flaky?”
- Flaky tests become obvious — and thus fixable.
In short, the dashboard channels your suite’s signals into a forum where they drive action.
What remains hard — and how teams should respond
The dashboard is not a magic bullet. It’s a visibility tool that still needs human attention.
- Assign a driver: ideally someone reviews the dashboard daily and alerts the team to issues.
- Make maintenance a team effort: everyone should know which E2E tests exist and update them when shipping new features.
- Bake E2E checks into your process: if peer-testing is part of your workflow, include a look at the E2E pipeline.
These practices are cultural as much as technical — and they determine whether E2E coverage stays meaningful.
Actionable takeaways
Based on “Visualizing Quality,” here’s a practical blueprint for teams aiming to stabilize and surface E2E outcomes:
1) Integrate tests tightly with CI/CD and run them daily.
2) Keep frameworks and browsers up to date.
3) Stabilize and version test data.
4) Use resilient selectors via dedicated data attributes.
5) Prefer auto-waiting and synchronize on network requests when possible.
6) Avoid testing external UIs; use APIs for necessary external steps (e.g., login).
7) Allow a single retry for failures; still investigate flakiness.
8) Treat some failures as product issues (e.g., layout shifts), not test defects.
9) Post run results to Slack after every pipeline execution.
10) Build a dashboard: export CI XMLs, ingest into a data warehouse (e.g., BigQuery), visualize with Grafana.
11) Establish ownership (a driver), involve the team, and make E2E checks part of your rituals.
Each step is simple; together they compound into a robust, transparent testing practice.
Why this architecture works
The strength of Armin’s approach is its simplicity:
- CI events already exist — the pipeline just forwards results instead of letting them vanish into logs.
- XML outputs become fuel for analysis.
- Google Cloud Storage and Cloud Functions provide clean triggers and serverless execution; BigQuery delivers fast reads for analytical queries.
- Grafana makes it straightforward to craft visuals and drilldowns without a heavyweight BI project.
The result is a lightweight data pipeline that does exactly what matters in day-to-day engineering: speed up answers, reveal patterns, and support decisions.
On runtime and debugging focus
Armin calls out concrete timing: roughly five minutes for unit tests and more than 20 minutes for E2E. That’s inherent — E2E verifies journeys, not just functions. Auto-waiting, synchronization with requests, and a targeted single retry reduce effective waiting time while maintaining stability.
Three elements of the dashboard especially help with debugging:
- Clear drilldown from “red” to “root cause” — from run overview to single test.
- Time-based views — turning “it feels flaky” into “it is flaky.”
- Failure reasons in a table — fast hypothesis building.
That combination eases the hardest E2E challenge: finding the signal in a system with many moving parts.
Quotable reminders from the talk
A few of Armin’s points stick — we paraphrase them here:
- E2E tests are the proxy for real user flows.
- Flaky means: failing one day, passing the next.
- Explicit waits waste time; auto-waiting saves time without risk.
- “Only test what you control” — external UIs are brittle; APIs are the pragmatic path when you must integrate.
- Sometimes the application is at fault — layout shifts can break both users and tests.
- Visibility is a process: Slack alerts, the dashboard explains, the team acts.
Conclusion: Make quality visible to make it manageable
“Visualizing Quality” by Armin Hutzler (NETCONOMY) lays out a practical way to not only build E2E tests but also integrate them into daily engineering life. The combination of best practices, Slack notifications, and a lightweight dashboard via BigQuery and Grafana delivers exactly what teams need: early warnings, context, and the ability to act.
Ultimately, it’s about confidence. Refactors proceed, the pipeline stays green, there’s no morning panic — and you know you “didn’t break anything important.” Visibility is not a nice-to-have; it’s the difference between perceived and proven quality.
Session context
- Title: Visualizing Quality
- Speaker: Armin Hutzler
- Company: NETCONOMY
Our takeaway from the session: stability is one side of quality; visibility is the other. When you combine both, you don’t just test — you ship with confidence.