Planner, Skill, Executor: How to Split Work Across Model Tiers

A coding session is rarely one job. It's usually three jobs jammed together, distinguished by how much novelty they contain.

The first job is figuring out what to build. You have some inputs — a Figma export, a paragraph of intent, a few mermaid diagrams, a half-finished design doc — and the goal is to produce a plan that a competent stranger could execute. This is the part where you want the most expensive model in the room.

The second job is stuff you've done before. Upgrading dependencies. Reviewing a PR for the eleventh time this week. Scaffolding a new route in your usual style. The novelty per session is near zero. The cost of having a senior model think this through every time is enormous, and the result is no better than the third time you did it manually.

The third job is building the thing. Once the plan exists, most steps are mechanical. Read the file, change the lines, run the checks, fix what broke. This is the bulk of the keystrokes and the part you most want to run cheaply and in parallel.

The interesting move is realizing these are different jobs that want different tools. In Claude Code today, that maps cleanly onto three primitives: Opus for planning, Skills for routine, Sonnet for execution. ChatGPT Codex ships the same primitives under different names. Both are converging on the same shape, which is worth understanding regardless of which one you use.

What I did this morning

This isn't theory — I'd just done all three in one Claude Code session.

Planning. I dropped twelve PNG screenshots of two design directions ("oxblood editorial" and "espresso + champagne") into a folder and asked Opus 4.7 with 1M-context for a theming plan. The screenshots showed the same six pages in two different palettes. Opus pulled the structural shape out of the images, named six semantic tokens that weren't in my palette table (the table only gave three colors per theme — Opus filled in surface, text-muted, border for each), spotted that five of the six themes would need a dark button-text color while only oxblood would keep light text on its red accent, and produced a Tailwind v4 plan with a six-theme table and three blocking decision questions before any code got written.

Routine. Same session, different request: "upgrade all dependencies for this Node app, run the quality checks, fix anything broken, don't commit." I asked Opus to make this a Skill — a file at ~/.claude/skills/upgrade-seo-site-deps/SKILL.md that any future session can invoke. The Skill has six hard rules ("no --force or --legacy-peer-deps without explicit authorization", "stop and report if pre-existing checks fail"), a baseline → inventory → wave-A → wave-B procedure, a list of known watch-outs (Biome 1→2 config renames, Zod 3→4 schema changes, React-plugin lag behind ESLint majors), and an explicit "pin and move on" escape hatch. Then I invoked the Skill on the same repo. Eight major versions bumped, eslint pinned back to 9 because eslint-plugin-react inside eslint-config-next 16 hadn't shipped an ESLint-10-compatible release yet. Total time: less than I'd have spent reading changelogs by hand.

Execution-as-tasks. With both plans in place, I asked Opus to break the combined SEO + theming + analytics + Playwright work into individual task files in jjdharmaraj-web/tasks/. Nineteen files came out, each self-contained: goal, source docs, files to edit, implementation steps, acceptance criteria, test plan, out-of-scope. Each file is sized so a single Sonnet session can complete and PR it. None of those nineteen tasks needs to re-read the design images or the SEO strategy doc; the relevant facts have been baked into each task's "context" section.

That's the loop. Plan once with the expensive model. Procedure-ize what you'll do again. Execute the rest with whatever's cheap and parallel.

Why this split is load-bearing, not cosmetic

There are three real reasons to do this, and one made-up reason that people cite that isn't a reason.

The real reasons.

First, context is the bottleneck, not cleverness. A planner that has read every design image, every existing component, every strategy doc, and every recent decision is making different choices than one that's been told "build a theme system." The expensive model isn't expensive because it's smarter at one-shot answers — it's expensive because it can hold a lot of disparate inputs in working memory and synthesize across them. Spend that capacity on synthesis, not on npm install.

Second, the planning artifact survives the model. When Opus writes nineteen task files, those files are now durable. The next session — different model, different day, possibly a different human — picks up a task and executes it without needing the planner's context. The task file is the handoff. This is the same reason senior engineers write design docs and tech leads write tickets: the document outlives the moment.

Third, skills compound across projects. A Skill is a written-down version of "what this team does when X happens." Once it exists, every future invocation gets the same procedure with the same guardrails. The dependency-upgrade Skill I wrote this morning doesn't only work on the project I wrote it for — it works on any Node app I point it at. The cost of authoring it amortizes over every repo I'll ever run it on.

The made-up reason. People sometimes claim the split is about cost. It is, but not in an interesting way. The bigger savings come from not having to redo work when the planner's context evaporates, not from saving cents per token.

How this lands in Claude Code

The Claude Code primitives line up almost too neatly with the three jobs.

Planning runs as the default mode when you launch Claude Code with Opus selected. Plan mode is its own state — you can hit it with the keybinding, or be in it automatically depending on settings. While in plan mode, Claude reads, searches, fetches, but doesn't write code. The output is a structured plan you approve before any edits land. For very long planning sessions, the 1M-context Opus variant exists explicitly so the planner can ingest entire designs, repos, and prior conversations without losing track.

Routine lives in ~/.claude/skills/<skill-name>/SKILL.md. Each Skill is a directory. The SKILL.md file has YAML frontmatter (name, description) and a markdown body of instructions. The description field is what lets a future Claude decide to invoke the Skill autonomously when a user request matches — so write that field as if you're tagging the trigger conditions, not summarizing the contents. Project-specific Skills can live at .claude/skills/ in the repo and ship with the codebase. Slash commands (/<skill-name>) work in any session where the Skill is on disk.

Execution runs with Sonnet 4.6 or Haiku 4.5 — fast, parallel-friendly, cheap enough that you can run a whole task graph against the same project without flinching. The Agent tool spawns subagents in isolated context windows, which is how you get nineteen parallel task executions without one task's noise polluting the next.

The glue between all three: the task file as the canonical handoff. The planner emits task files. The skills get cross-referenced from task files when a task needs a known procedure. The executor reads one task file at a time, does the work, and reports back. No model needs the full history.

The same pattern in ChatGPT Codex

If you'd done this exercise in ChatGPT Codex instead of Claude Code, almost every primitive has a counterpart. The names differ, the storage paths differ, the slash-command surface is more locked-down on the Codex side as of mid-2026, but the underlying shape is the same.

Codex equivalents:

| Claude Code | ChatGPT Codex | |---|---| | CLAUDE.md (project) | AGENTS.md (project) | | ~/.claude/CLAUDE.md (user) | ~/.codex/AGENTS.md (user) | | Skill at ~/.claude/skills/<name>/SKILL.md | Skill at ~/.agents/skills/<name>/SKILL.md | | Project Skill at .claude/skills/<name>/ | Project Skill at .agents/skills/<name>/ | | Plan mode (Opus) | /plan slash command | | Subagent via Agent tool | /agent to switch agent threads | | Custom slash commands at ~/.claude/commands/ | Built-in slash commands only (~30, no user-authored as of this writing) |

The Skill format is nearly identical. Codex Skills require the same name/description frontmatter, use the same "progressive disclosure" pattern (the model sees only the description until it decides to load the body), and resolve from multiple scopes in order: $CWD/.agents/skills, parent directories, $REPO_ROOT/.agents/skills, $HOME/.agents/skills, and a system-level /etc/codex/skills. You invoke a Codex Skill explicitly by typing $ followed by the name (Claude Code uses /), or autonomously when your prompt matches the Skill's description (same trigger model as Claude Code).

Two interesting differences:

Codex has a formal plugin system on top of Skills. A plugin is a Skill bundle plus MCP server config plus app integrations, packaged for distribution to a team or org. Claude Code Skills today are bare directories — sharing is by copying or git-cloning into ~/.claude/skills/. If you're building a team-wide set of procedures and want a real distribution story, the Codex plugin shape is the more mature primitive.

Codex has multi-agent parallelism as a first-class app concept. Tasks run in separate threads organized by project, with built-in git worktree support so several agents can edit the same repo without stepping on each other. In Claude Code, parallel execution comes from launching several Agent tool calls in a single message, each in its own context window — same outcome, but the orchestration is explicit per-invocation rather than persistent in a project view.

For our exercise, the workflow translates one-to-one:

Run Codex with GPT-5.4 (or whatever the highest-reasoning model is in your tier) for the planning session. Hand it the designs. Output: task files.
Encode the dependency-upgrade procedure as a Codex Skill at ~/.agents/skills/upgrade-seo-site-deps/SKILL.md. Frontmatter name/description, body identical to what you'd write for Claude Code. Trigger it with $upgrade-seo-site-deps or let the description auto-match.
Execute the task files with a smaller/faster Codex model (/fast, or /model with a lighter selection), one task per agent thread, parallelism via the app's multi-thread view.

The one thing you can't currently do in Codex that you can in Claude Code is author your own slash commands as plain files on disk. Codex slash commands are built-in. If you want a /deploy or /release-notes shortcut that maps to a specific procedure, you do it as a Skill (which is the right answer anyway — slash commands and Skills are converging on the same primitive).

What I'd build differently if starting fresh

A few things this morning surfaced that I'd bake in earlier next time.

Write the Skill before you need it the second time. I wrote the upgrade Skill because I was about to upgrade a Node app, then ran it on that app. I could have written it the first time I'd done this kind of upgrade by hand. The cost is the same; the only thing I'd give up is the visceral memory of how the procedure should actually go. If you find yourself doing something the third time with broadly the same shape, the Skill is overdue.

Make tasks small enough to fit in one executor session. The temptation is to write one huge task ("ship the SEO pass"). The right size is whatever fits in the executor's context window comfortably and can PR independently. Mine ended up ~150 lines each, with explicit "depends on" lines so the executor knows the right order. If a task needs to read three other tasks to make sense, it's too big.

Treat the planner's description field as a search query. Whether it's a Skill description or a task file's first line, that's the string future-you will use to find the artifact. Pack it with trigger words. "Upgrade every dependency including majors for a Node.js project, run quality checks, fix breakage, don't commit" is better than "dep upgrade helper." Make it sound like the question your future self will ask.

Don't bake the model into the artifact. Skills don't say "use Opus" or "use Sonnet" — they describe the procedure. Task files don't either. The whole point of the planner/executor split is that the artifact is portable across models. If your task file requires the executor to be the same model as the planner, you've collapsed the layers.

The trend across both Claude Code and ChatGPT Codex is the same: more of the workflow surface is becoming files on disk instead of model state. CLAUDE.md, AGENTS.md, SKILL.md, task files, plan-mode outputs. The model becomes the thing you swap; the procedure stays put. Treat that as the design point and the rest follows.