Skill-Driven Web Apps, Part 1: Scaffold, Upgrade, Patch — One Reference Implementation, Three Skills | JJ Dharmaraj

Skill-Driven Web Apps, Part 1: Scaffold, Upgrade, Patch — One Reference Implementation, Three Skills

A personal site is not a one-shot deliverable. It is a long-lived thing. You scaffold it once, then it lives for years, slowly accumulating SEO improvements as the engines shift, security patches as CVEs land, dependency churn as the ecosystem moves, and the occasional design or feature change. Most of that work is procedural. Most of it does not need a senior engineer thinking from scratch every time.

I am running an experiment: instead of treating each of those lifecycle events as a one-off coding session, I am encoding them as Claude Code skills, sharing a single source of truth across them, and letting the procedures evolve in lockstep with the reference site. This post is the architecture and the maintenance contract. Part 2 will be about adding Lighthouse to the same architecture without breaking it.

The lifecycle, in three skills

A site I care about goes through exactly three kinds of work after the initial design is settled:

Scaffold. "Build me a new site with the same shape as the one I already have." This happens rarely — once per new site — but I want it to be repeatable, fast, and ship a green test pipeline on day one.
Upgrade for a business goal. "Take my existing site and bring its SEO posture up to the current state of the art." This happens whenever search and AI engines change their rules. The procedure is incremental; the diff is mechanical once you know what to apply.
Patch. "There is a security advisory in transitive deps. Update everything, fix the regressions, do not deploy until the pipeline is green." This happens whenever the ecosystem moves. Same procedure, different filter.

Each of these is a different skill. They live in ~/.claude/skills/ and have explicit, non-overlapping scopes:

scaffold-seo-site — bootstraps a brand new site from a recipe.
upgrade-seo-site — (coming soon) applies new SEO learnings to an existing site.
upgrade-seo-site-deps — bumps every dependency and verifies the unified pipeline still passes.

The non-overlap rule is load-bearing. upgrade-seo-site-deps does not change SEO posture. scaffold-seo-site does not bump dependencies on an existing project. upgrade-seo-site does not initialize a project from scratch. The moment a skill starts doing two things, the boundary blurs and the procedures drift.

The drift problem, and the fix

Three skills covering the same kind of project means three places that "know" what the stack is, what the SEO posture is, and what counts as a passing test. The naive version of this drifts immediately. Each SKILL.md restates the rules slightly differently, an update lands in one but not the others, and within a few months you have three contradictory specs.

The fix is a shared layer. Underneath the three skills sits a small set of documents:

_shared/tech-stack.md — what stack we use, why, with versions pinned to whatever the reference site is currently running.
_shared/seo-techniques.md — every SEO control plane we touch: canonical, robots, sitemap, JSON-LD, OG/Twitter, per-bot directives, content flags, IndexNow.
_shared/testing-methodology.md — the unified regression pipeline, the spec list, the gtag spy quirk, the fix-forward rules.

Each SKILL.md is short. It describes the procedure — the wave order, the stop-and-ask criteria, the handoff. It does not restate the facts. When it needs to refer to a fact, it links to the shared doc.

This is the same pattern used in any well-factored codebase. Don't repeat yourself; factor out the common knowledge; let the consumers compose. The novelty is applying it to AI agent skills instead of to source code.

The reference implementation is the spec

There is one more piece, and it is the most important one: the shared docs do not hold the truth. The truth lives in a single repository that all three skills track — in my case, jjdharmaraj-web. The shared docs are summaries of that repository's current state.

This inverts the usual relationship between specs and code. Most teams write specs first and let the code drift away from them. I am writing the code first — in one canonical repo — and treating the spec as a derivative. When jjdharmaraj-web introduces a new SEO pattern, I update _shared/seo-techniques.md to describe it. When it adopts a new dependency, I update _shared/tech-stack.md's version table. The repo is always right; the docs always lag by one commit.

This works for two reasons. First, there is exactly one reference implementation, so there is no ambiguity about which version of reality to document. Second, the docs are short — they cite paths in the repo rather than pasting code — so the cost of keeping them current is small.

The pipeline is the contract

Across all three skills, the thing that decides "done" is the same: a single command that runs every check and writes one HTML report.

npm run pipeline

That runs Typecheck → Format check → Lint → Vitest with coverage → Playwright with the JSON reporter. The runner captures pass/fail per step, parses coverage and Playwright JSON outputs, copies the visual screenshots into the report directory, and writes pipeline-report/index.html with badges, a coverage table, a per-test failure list, and a route × device screenshot grid.

The skills are written so the report is the only thing that matters. scaffold-seo-site is done when a freshly-generated project produces a green report. upgrade-seo-site-deps is done when, after the upgrade, the report is still green. There is no notion of "I ran the typecheck and it passed." Either the report is green or it isn't.

This is especially important when an agent is running the procedure. Agents are happy to tell you "the checks pass" if you let them interpret "the checks" loosely. The unified report removes that ambiguity. Either the artifact says ALL PASS or it lists exactly what broke, in one file, with screenshots.

The report itself is also still being shaken out. One small example: the first version showed "121 passed / 183 skipped" and I almost wrote that down as "great, mostly green." The 183 skipped tests turned out to be pure noise — Playwright was scheduling the desktop-only regression specs (SEO, accessibility, theming, navigation) on all four device profiles, then skipping them on three of them with an inline test.skip(...) at the top of each spec. The fix was one config change to use per-project testIgnore instead of inline skips. After that, the report showed 121 passed / 0 skipped — same actual test coverage, but the number of "skipped" went from misleading to honest. The lesson is that the report's clarity is part of the contract too. If the numbers can be ambiguous, the report can lie, and a skill running on it can lie by transitivity.

How the maintenance loop actually works

The architecture only works if there is a clear rule for what to update when reality changes. The rule:

When jjdharmaraj-web introduces a new pattern, decide which bucket it falls into: stack change, SEO change, testing change, or future performance change.
Update only the matching _shared/*.md file.
Decide whether any skill's procedure changed. If yes, bump that SKILL.md. If no, do nothing — the skills cite the shared doc and inherit the change automatically.

Three quarters of the time, step 3 is "do nothing." A dependency bump in the reference repo means updating _shared/tech-stack.md's version table; the scaffold skill keeps citing that doc and produces new projects with the bumped versions. A new SEO regression test means adding a row to _shared/testing-methodology.md's spec table; the scaffold skill copies the test suite verbatim so it picks the new test up for free.

The skills change shape only when the order of operations changes — a new wave gets added to the upgrade procedure, a new phase gets added to the scaffold, a new "stop and ask" criterion gets surfaced. That is rare and intentional.

Where this isn't done yet

I want to be clear about what's actually working versus what's still in progress, because the value of this whole approach is the loop — not any particular snapshot. As of this post:

Unit test coverage is at about 25%. I started this work with zero unit tests in the reference repo and just added the first batch — 38 tests covering buildMetadata, buildCanonicalUrl, robotsPolicy, and the note frontmatter schema. The most logic-heavy file (buildMetadata.ts) is at 100% statements and 91% branches. The JSON-LD builders and the analytics events helper still have no unit coverage. The plan is to add tests to each module as I touch it, not to do a coverage push for its own sake.
The skipped-test noise I described above was real, sat in the reports for several runs, and I only noticed because I asked "wait, why are there a hundred and eighty-three skipped tests?" Worth saying explicitly: the pipeline only catches the regressions you've thought to assert. The fix here improved clarity, not the underlying coverage.
The upgrade-seo-site skill is still hypothetical. I sketched its scope in _shared/seo-techniques.md and the maintenance contract reserves a slot for it, but I haven't written it. The right time to write it is the first time I actually have to apply new SEO learnings to an old site — that work will tell me what the procedure should look like. Writing it ahead of time would be speculative.
The pipeline still doesn't run Lighthouse, doesn't pixel-diff screenshots, doesn't run an axe-core accessibility deep audit, and doesn't verify the IndexNow ping went through. Each of those is on the list. Each one is a future small change to the runner plus a new entry in _shared/testing-methodology.md.
The reference implementation is one site. With a sample size of one, "the architecture works" is a hypothesis, not a conclusion. The first real test will be when I scaffold a second site from this skill set and find out which assumptions don't generalize.

The point of the architecture isn't to ship a complete spec. It's to make every gap visible and obvious to close. The skipped-tests fix is a tiny example: noticed, diagnosed, fixed, documented, and the same shape of fix is now available the next time I see misleading numbers somewhere else.

What this is not

This is not a replacement for engineering judgment. The skills handle the procedural parts of the lifecycle — the parts where "do it the way we did it last time" is the right answer. The decisions that actually require thinking (do we adopt a new SEO control plane? do we redesign the home page? do we switch frameworks?) happen outside the skills, with a senior model and a human in the room. The skills only execute decisions that have already been made.

It is also not a generic web-app framework. The three skills are tightly scoped to one shape of project: a static-export Next.js site optimized for SEO, with a particular test pipeline. Sites that need a server runtime, an auth layer, real-time data, or a paywall do not fit. That is fine — for those, pick a different stack and write different skills. The architecture (one reference implementation, shared docs, narrowly-scoped skills) generalizes; the specific stack does not.

What Part 2 will cover

Next on the list is Google Lighthouse. The goal is green-across-all-pages performance and accessibility scores, automatically verified on every pipeline run, on every device profile. That work is going to touch all three layers of this architecture:

A new _shared/lighthouse-targets.md documenting the target scores and the audit-by-audit guidance.
A new pipeline step that runs Lighthouse and embeds the scores in the same HTML report.
A new section in _shared/testing-methodology.md describing what the step does and how to fix-forward when an audit regresses.
Potentially — only if the procedure changes — updates to scaffold-seo-site (so new projects get Lighthouse from day one) and upgrade-seo-site-deps (so dep upgrades that tank performance get caught before deploy).

Part 2 is also a real-world test of the maintenance loop. If the architecture works, adding Lighthouse should be a tractable change to two or three files — not a rewrite. If it requires a rewrite, the architecture is wrong and I will say so.

The post after that — Part 3, eventually — will be about the second skill in the trio: upgrade-seo-site, the one that retroactively applies SEO learnings to an old site. That skill is still hypothetical. It will become real the first time I have to do that work, and it will be shaped by what I learn doing it.

Why I think this matters

For decades the friction in keeping a website healthy was the cost of skilled human attention. You either paid for it continuously (a team) or you let things rot (everyone else). Skills change that economics. A procedure that once required a senior person to load context, check the current state of the art, and run a multi-hour upgrade can now run on demand, predictably, with the human only stepping in for the genuinely ambiguous parts.

The interesting move is realizing that the skills themselves are the long-lived artifact. The reference implementation is the spec. The shared docs are the index. The skills are the procedures. All three evolve together, and any one of them becomes stale if you let them drift.

I will know whether this works the next time something in the ecosystem moves. Until then, the pipeline is green and the report is at pipeline-report/index.html.