One Report, No Excuses: A Single-Pane Regression Pipeline

Most codebases have the same five checks scattered across five places. Typecheck runs in your editor. Lint runs on save. Unit tests live in vitest. End-to-end runs in playwright and dumps an HTML report into one folder. Visual screenshots get a different folder. Coverage gets a third. When one fails, you check that one. When you forget to check the other four, regressions ship.

The fix isn't more tools. It's putting all of them into a single artifact that says either "everything's green" or "here's exactly what broke, with screenshots."

The shape of the pipeline

The runner is a small Node script. It executes each step in order, captures stdout and stderr, parses any structured output, then writes one HTML file at pipeline-report/index.html.

const typecheck  = run("Typecheck",             "npm run typecheck");
const format     = run("Format check",          "npm run format:check");
const lint       = run("Lint",                  "npm run lint");
const unit       = run("Unit tests + coverage", "npm run test");
const playwright = run("Playwright e2e",        "npx playwright test --reporter=json", {
  PLAYWRIGHT_JSON_OUTPUT_NAME: PLAYWRIGHT_JSON,
});

That's the whole control flow. Each run() returns { name, passed, durationMs, stdout, stderr }. After everything is done, the script reads three side outputs:

coverage/coverage-summary.json — produced by Vitest's json-summary reporter
pipeline-report/playwright-results.json — produced by Playwright's --reporter=json
visual-report/shots/*.png — produced by a Playwright spec that takes full-page screenshots of every route on every device profile

Those three feed three sections of the HTML: a coverage table, a per-test failure list with error messages, and a screenshot grid (routes × devices). The whole report is one self-contained file the user can scroll top-to-bottom.

Why one report changes behavior

When checks are scattered, the cost of "running them all" gets paid by whoever notices. If the agent or the developer skips one, nothing forces a reckoning until production. With a single command and a single artifact, "all of them" is the only mode. There's no incentive to skip npm run test because you already ran npm run typecheck — the pipeline runs both and writes one row each into the same table.

The bigger shift is when an agent is doing the work. An agent will gladly tell you "checks pass" if you let it interpret "checks" loosely. A unified pipeline removes that ambiguity: the artifact either has an ALL PASS badge or it doesn't. The report path is fixed. The failure messages are quoted. There's no place for vague optimism to hide.

What I learned wiring it up

Three things were not obvious until they bit me.

1. Playwright tests collide with unit test runners. Vitest's default include pattern is **/*.{test,spec}.{js,ts,...} — which happily picks up Playwright spec files and tries to execute them as unit tests, then explodes because test.beforeAll() from Playwright doesn't exist in Vitest's runtime. The fix is one line in vitest.config.ts: exclude: ["tests/e2e/**"]. Worth knowing before you spend an hour reading misleading stack traces.

2. Spy installation order matters when a third-party script declares a global. @next/third-parties injects an inline script that includes function gtag(){dataLayer.push(arguments);}. That's a function declaration at the top level of a script tag, and in ECMAScript, function declarations at script top level create properties on the global object. Which means: even if you carefully install window.gtag via Object.defineProperty in a Playwright addInitScript, the moment GA's inline script evaluates, your descriptor gets replaced by the function declaration.

The symptom looks like a flaky test — page_view is captured (fires during initial render before GA loads), but every click event after that is silently dropped. The fix is to also patch window.dataLayer.push via a setter on the array, because every gtag call ultimately routes through there. Capture at both ends, you can't miss.

3. Real regressions hide in tests you never wrote. The first run of my strengthened SEO regression spec failed three assertions on routes I'd shipped to staging weeks earlier: the home page had no og:image, the notes index had no twitter:card, and the sitemap's canonical entries had inconsistent trailing slashes. None of those were caught by typecheck, lint, unit tests, or a manual smoke test. They were caught by expect(ogImage).toBeTruthy() running against every public route. Tests pay you back twice — once when they fail, and once when they catch a real bug the first time you write them.

What the report actually shows

The HTML has four sections, in this order:

Quality checks — one row per step, status badge, duration, collapsible output on failure.
Unit test coverage — lines / statements / functions / branches, with covered/total fractions.
Playwright — pass/fail/skip counts, then a list of failed tests with their error messages.
Visual screenshots — every route × every device, linked to the full-resolution PNG.

The whole file is generated by the runner from JSON inputs. No HTML test framework, no static site generator — just fs.writeFileSync and a template literal. About 150 lines of Node, no dependencies beyond the standard library.

How this slots into a dependency upgrade

I wired the same runner into my Claude Code skill for upgrading Node dependencies. The skill walks through npm-check-updates in waves — patches and minors first, then majors one ecosystem at a time. After each wave, the agent runs the unified pipeline. If anything is red, the agent reads the failure block and fixes forward — updating the calling code when a library's behavior actually changed, fixing the test when the assertion was wrong, never just pinning the package back.

The pipeline is the contract. The agent can't claim a wave is done unless the report says ALL PASS. That removes the most expensive failure mode of AI-driven dependency upgrades, which is "everything compiled, looks fine, broke in prod three days later because the e2e tests were never run."

The minimum viable version

If you want to try this on your own project, the smallest functional version is:

A scripts/run-pipeline.mjs (or .js) that calls each check via spawnSync, captures the result, and writes one HTML file.
Vitest's json-summary reporter enabled so you can read coverage/coverage-summary.json.
Playwright's --reporter=json set up to write playwright-results.json.
One Playwright spec — visual-report.spec.ts — that screenshots every route on every device profile and saves to visual-report/shots/. The runner copies these into the report directory.
An npm run pipeline script that invokes the runner, plus pipeline:open to launch the HTML.

That's it. The hard part isn't the runner — it's deciding that "the report" is the only thing that matters, and then never letting anything else become the source of truth.