Logo RUBICON IT GmbH

RUBICON IT GmbH

Established Company

Capture Screenshots with Playwright & SpecFlow

Description

Hannes Etl von RUBICON IT spricht in seinem devjobs.at TechTalk über die Aufgabe, mit Playwright und SpecFlow eine große Zahl an Screenshots automatisiert zu erstellen und zu verwalten.

By playing the video, you agree to data transfer to YouTube and acknowledge the privacy policy.

Video Summary

In Capture Screenshots with Playwright & SpecFlow, Hannes Etl (RUBICON IT GmbH) shows how his team automates hundreds of documentation screenshots for the Document Partner product by combining Playwright with SpecFlow. He outlines a three-layer architecture (Gherkin feature files, C# step definitions, and page objects via CSS/XPath), a lightweight setup using a C# project, a few dependencies, and a PowerShell script to install browsers, Playwright’s auto-waiting, SpecFlow hooks, and reference image comparison with Pixelmatch; the example scenario grants a user admin rights in the Repository Manager before capturing the screenshot. The result is readable, maintainable scenarios and faster, more stable runs (522 screenshots per language, ~109 feature files, nearly 400 scenarios; total runtime cut from about 23 to 12 minutes) that teams can readily adapt for documentation workflows.

Scaling Documentation with Automated Screenshots: Lessons from “Capture Screenshots with Playwright & SpecFlow” by Hannes Etl (RUBICON IT GmbH)

When screenshots become an engineering problem

Screenshots are often treated as an afterthought—until your documentation grows to hundreds of pages in multiple languages. In “Capture Screenshots with Playwright & SpecFlow,” Hannes Etl from RUBICON IT GmbH showed why screenshots can quickly turn into a scaling challenge and how a modern automation stack can bring order to that chaos.

RUBICON develops several software products. One of them is “Document Partner,” focused on documents—creation, distribution, and electronic signatures. It supports a range of formats, from Word, Excel, and PowerPoint to PDF and PDF/A. The product ships with a comprehensive user manual that has grown to roughly 500 pages, with about as many screenshots. And that’s just for one language. Two languages are supported—German and English—so we are talking about approximately 1,000 pages and 1,000 screenshots in total.

Anyone who has managed documentation at that scale knows manual screenshot maintenance does not work. Automation is a must. RUBICON had such automation in place for a long time, but the existing technology was no longer maintained. That triggered the decision to switch to a new approach: combining Playwright—a web testing framework by Microsoft—and SpecFlow, a BDD (Behavior-Driven Development) framework.

The two were, as Hannes put it, “married.” The result: screenshot scenarios can now be written in a non-technical language. That matters because it enables contributions from colleagues who aren’t deep into programming—think support staff or technical writers.

The product context: “Document Partner” and the Repository Manager

“Document Partner” consists of multiple components. One of them is a web application called the “Repository Manager,” used to manage documents or, more precisely, document templates. The example showcased in the session was deliberately concrete: grant a user administrator rights by selecting the user and checking a checkbox, then take a screenshot. This is the everyday pattern the new implementation targets—describing the state and capturing a reliable image that reflects it.

The power of the setup lies in describing such flows in a clean, readable syntax so that domain-oriented scenarios can be authored and then executed by the automation.

The architecture: Three layers with crisp boundaries

RUBICON’s solution is built as a three-layer architecture:

  • Feature layer: Scenarios are described in a non-technical language using Gherkin with the keywords Given, When, Then. The focus here is on readability and conciseness so that people beyond engineering can write and review scenarios.
  • Step-definition layer: This is the mapping from the feature sentences to code. The team uses C#. Sentences are bound to method headers via attributes, which makes the natural-language steps executable.
  • Page-object layer: This is where access to concrete web elements lives—buttons, labels, drop-down lists, and more. Element selection is typically done via CSS selectors (classes, IDs), with XPath as an alternative. This follows the Page Object pattern and isolates UI details.

This separation is intentional: human-readable scenarios on top, a precise translation to code in the middle, and encapsulated UI locators at the bottom. The division of responsibilities makes maintenance easier and amplifies reuse.

BDD in action: Expressive scenarios that also execute

On the feature layer, the team uses Gherkin. The well-known structure applies—Given (precondition), When (action), Then (expectation). Hannes outlined how a feature file might look in practice. The emphasis was less on individual lines and more on the effect: the scripts are “very readable” and “very compact.” That’s the point—BDD brings specification, testing, and documentation closer together. In this case, the BDD description drives repeatable screenshot creation for the manual.

Why is this valuable? Because colleagues in support and technical writing can express use cases in words that automation then carries out in the browser. The step definitions bridge the gap between a domain description and a running script, reducing friction and duplication.

Step definitions: The bridge from Gherkin to C#

The step-definition layer turns sentences into C# methods. The mapping uses attributes placed right above method headers. When a feature sentence matches an attribute, the corresponding method executes at runtime.

The result is that plain sentences cease to be just prose; they become executable descriptions. This gives the feature layer real leverage—it’s the primary entry point for the automated flow.

Page objects: Where UI details are contained

The page-object layer encapsulates UI specifics. Elements are located via CSS selectors (classes, IDs), or alternatively via XPath. This keeps the feature scripts free of selectors and allows step definitions to focus on domain activities. When the UI changes, page objects can be updated without rewriting scenarios.

This is the Page Object pattern in action—it stabilizes automation by reducing the coupling between domain intent and UI mechanics.

Setup and installation: From zero to first run, quickly

One advantage Hannes highlighted is the straightforward setup. The path to the first running scenario looks like this:

  1. Create a C# project—either SpecFlow or NUnit.
  2. Add a handful of dependencies—“three or four.”
  3. Run a PowerShell script to install the browsers needed for screenshotting web applications.
  4. Create the three layers (feature, step definition, page object).
  5. Execute scenarios via the Visual Studio Test Explorer—just like unit tests.

The key message: setup is “child’s play.” Once browsers are installed, you can write feature files, add step definitions, and populate page objects. From there, the pipeline can be executed from the Test Explorer.

Playwright’s auto-waiting: Taming timing issues

Web tests are notoriously brittle due to timing issues: you try to click a button that isn’t visible yet and the test fails. Playwright addresses this with built-in auto-waiting.

As Hannes emphasized, this works “out of the box,” and “in 95% of cases.” While there may be edge cases that need adjustment, the default behavior drastically cuts down timing-related instability. For reliable screenshotting—capturing a UI state consistently—this is a major win.

SpecFlow hooks: Executing code at the right time

Hooks are the second pillar Hannes called out. They allow you to run code fragments at specific lifecycle moments. A practical use case is loading test or demo data. SpecFlow enables this with attributes as well—for example, to run code “before a feature” executes, and do so once per feature.

For screenshot automation, this is critical because it lets you establish preconditions: data, user permissions, and initial configurations can be set up predictably. The result is a stable execution environment that improves the reliability of every captured image.

Visual validation: Comparing images with Pixelmatch

A screenshot by itself doesn’t tell you whether it’s correct. The team addresses that by comparing generated images with reference screenshots, using the Pixelmatch library. The key advantage here is the configurable threshold parameter. It defines—via a percentage—when two images should be considered equal.

That threshold is essential to tolerate minor differences (e.g., subtle rendering variations) without missing meaningful deviations. For multi-language documentation that has to be consistent over time, this pixel-level verification provides a robust quality signal.

A simple but telling UI pattern: Grant rights, take a screenshot

The session’s example was purposefully accessible and practical. In the “Repository Manager” web app, a user is granted administrator rights. The UI interaction is:

  • Select a user.
  • Check the checkbox for admin rights.
  • Capture a screenshot.

On the feature layer, this becomes a readable scenario. The step definitions bind the sentences to the C# methods that perform the UI interactions. The page objects identify the UI elements. The outcome is a reproducible screenshot that reflects the state after the permissions change—across the needed languages.

Scale and performance: The numbers

Hannes closed with concrete metrics of their production setup:

  • Approximately 500 screenshots per language—exactly 522.
  • Two languages (German and English).
  • 109 feature files.
  • Almost 400 scenarios.
  • Runtime nearly halved: from just under 23 minutes down to about 12 minutes.

For documentation workflows, the runtime reduction is a powerful lever. It speeds up iteration, makes updates more frequent, and reduces friction between development, QA, and documentation.

What engineering practices we took away

All takeaways below are directly grounded in the session’s content:

  • Write scenarios in everyday language: Gherkin on the feature layer delivers readability and accessibility. It’s the key to enabling non-programmers to contribute.
  • Keep a clean architecture with page objects: It eases maintenance and protects scenarios from UI changes. CSS selectors (classes, IDs) as the default, XPath as an alternative, keeps element access consistent.
  • Prioritize stability: Playwright’s auto-waiting reduces timing issues “out of the box.” In the small set of exceptions, fine-tuning is possible, but the default stability is a strong foundation.
  • Establish consistent preconditions: Use SpecFlow hooks—such as running code “before a feature”—to load test or demo data so scenarios start from a controlled state.
  • Validate visually: Pixelmatch with a threshold translates “visual equality” into a measurable condition, tolerating small differences while flagging real mismatches.
  • Operate like tests: Run scenarios via the Visual Studio Test Explorer, similar to unit tests. This lowers the operational barrier for teams.

From concept to a working pipeline: The cohesive picture

What stands out is not a single trick, but the cohesion of the approach:

  • A readable domain layer for scenarios.
  • A precise mapping layer to C# methods.
  • A UI encapsulation layer via page objects.
  • A testing framework that mitigates timing issues by default.
  • Lifecycle hooks to make preconditions reliable.
  • A pixel diff to turn image correctness into a thresholded, binary decision.

Together, these elements produce a pipeline that keeps documentation high-quality at scale—across languages, in hundreds of scenarios—and does so with measurable runtime benefits.

Conclusion: Treat screenshots as first-class engineering

Screenshots may sound simple. At the scale of 1,000 pages and 1,000 images, they become an engineering challenge. In “Capture Screenshots with Playwright & SpecFlow,” Hannes Etl (RUBICON IT GmbH) shared how Playwright and SpecFlow can structure this challenge. A three-layer architecture, Gherkin for readability, step definitions for the executable bridge, page objects for encapsulation, auto-waiting to tame timing issues, hooks for data states, and Pixelmatch for visual verification—these elements form a coherent system.

The numbers—522 screenshots per language, 109 feature files, nearly 400 scenarios, and runtime cut from around 23 minutes to roughly 12—underscore that this is not theory but practice. For teams responsible for product-grade manuals, this approach is a solid blueprint. Perhaps the most important point is that scenarios in non-technical language open the door to contributors beyond engineering. That’s where the biggest gains surface—shared understanding, clear roles, and automation that treats documentation with the same rigor as code.

The session’s takeaway is clear: with Playwright and SpecFlow, screenshot generation becomes faster, more robust, and accessible to a broader team—turning a once fragile task into a repeatable, controlled part of the documentation pipeline.

More Tech Talks

More Tech Lead Stories

More Dev Stories