Each week, the newsletter features a workflow from an engineer using AI coding tools in their actual work. This page collects all of them in one place and serves as a living archive for the "Workflow of the Week" section from the newsletter.

I publish Agentic Coding Weekly every Monday. If you want new workflows like these in your inbox, subscribe below.

Table of Contents

Leonard Lin: AGENTS.md Details and Spec-driven Development Loop

Published in ACW #14 on Feb 23, 2026.

This week's workflow is from Leonard Lin, founder and CTO of Shisa.AI. Over to Leonard:

I've been closely tracking AI-assisted coding since the ChatGPT launch. My goal is to have an eye on the bleeding edge, but to not be caught up chasing every new thing. There's a lot of churn and FOMO in the space, but things also get "baked-in" very quickly, and it's important to remember, the point is to save time and effort. My goal is Pareto efficiency: see what is boiled-down post-hype, adopt what gives the best gains for my actual work (vs yak-shaving).

Big Picture

Opus 4.5 and GPT 5.2 were the biggest game changers for me and since their release, now do 99% of my code-writing. Before that, I was mostly still going through most of my code (e.g., except for one-off scripts I didn't care about), but I've now switched mainly to enforcing quality through specs and test coverage even for long-lived projects (the latest models can spelunk and parse code better than I can).

My Current Setup

This is in a state of flux as new scaffolding comes out and I've been poking at new scaffolds and control planes, and I fully expect my setup to look quite different in 6 months, but my main setup currently is byobu with aggressive F8 pane labeling:

  • One byobu session per project

  • Instead of subagents/swarms, I aggressively create new panes anytime I need

  • Every pane gets descriptively named (e.g., coder-5.3, planner-claude, etc.)

  • I tend to use a single main coder per worktree but have multiple planners or reviewers

I mostly use Codex and Claude Code, but use some Amp and OpenCode as well. I've tried some mobile/web prompting flows (Happy, etc) but generally byobu (tmux) via Tailscale is how I manage everything. This lets me run my agents and access them remotely when I want, and is an easy adaptation of the same sort of workflow I've used for years (but with AI agents coding).

My general preference is to run the strongest models available. As of my latest testing, GPT-5.3 Codex xhigh is the strongest general coder for my workload, and I run Opus 4.6 for writing, planning and more interactive flows and also use GPT-5.2 xhigh for planning, review and deep work (leaving it alone for minutes/hours).

I have a custom ccstatusline and use ccusage and @ccusage/codex for some stats tracking as well, but besides that am pretty vanilla on my plugins/addons.

Workflow

Historically I've leaned heavily into using Deep Research for research, but I've increasingly been migrating those ad-hoc queries into git repos/agentic coding tools for rapid iteration. Putting everything into agent-accessible markdown files is a huge multiplier.

As mentioned, I don't trick out my setup. There's a whole cottage industry of Claude Code plugins, custom skills, and meta-frameworks, and while I keep a repo/doc tracking and analyzing the major ones like superpowers, get-shit-done, etc., a well-written AGENTS.md seems to cover most of my needs.

My AGENTS.md (and symlinked CLAUDE.md) is where the core of my current process lives. Each project has a custom AGENTS.md that I've been evolving. Versions are shared with the rest of my (human) team as well. Despite having boiled down my best practices, my AGENTS.md is not super long - currently about 250 lines.

The most important bits in my AGENTS.md:

  • Points to README.md and docs/ locations and dev environment

  • Summarizes the project and project goals, design principles

  • Has a structure/treemap of the key folders

  • Lists the development philosophy and my preferred dev loop

  • Outlines roles/lanes for different agents

  • Specifies what documentation, tests, git commit practices, and other practical bits I want

Most of this is self-explanatory, but the most important part is probably my spec-driven development loop, especially the independent review loop.

IMO, spec drift is where most people get in trouble with AI-assisted coding, and process is mostly about helping to keep you and your agents out of trouble on that front:

  • RESEARCH and ANALYSIS are done before a PLAN is created.

  • After the PLAN is ready, an IMPLEMENTATION (punchlist+worklog) is created - this is the core control point in my process and every major decision is made here. I usually work with at least 2 planner models to think through everything, and then work with the coder model to make sure that the IMPLEMENTATION is crystal clear for that model.

  • Once IMPLEMENTATION is locked down, the coder is able to autonomously knock out all the milestones in EXECUTION:

    • Test-driven: test coverage is required before code is written

    • Coding: write a first pass, go through the entire punchlist, give a summary for reviewers

    • Review: I found using independent reviewers to be very important. All issues are gathered up and sent to the coder to fix

    • Remediation: the coder goes through and fixes the reviews, and it's sent back from review (repeat this part until everything is green)

    • Analysis: I do a post/pre-analysis at each milestone to correct for any implementation drift, scoping, or decision-making

  • At the end of each sprint, I have started doing a STATUS review that's basically a post-mortem, implementation review, etc. that's useful for both me and for improving the process in the future

  • Ad-hoc development still largely adheres to this loop!

Despite basically not touching the code anymore, I feel like I still have a decent grip of what's being generated and I'm currently still very hands-on with managing my agents on coding sprints. The amount of engineering-effort from my side probably remains the same, but features (and new business lines) land much more quickly. While some people have adopted more chaotic flows, for now, I still try to keep the development as intentional as possible.

I also have an explicit "meta" rule at the end of my AGENTS.md:

## Meta: Evolving This File

This AGENTS.md is a living document. Update it when:
- You discover a workflow pattern that helps
- Something caused confusion
- A new tool or process gets introduced
- You learn something that would help the next person

Keep changes focused on process/behavior, not project-specific details (those go in docs/).

In practice though, my post-sprint analysis is where I'll review with my agents to see what actually needs to be changed or updated in the AGENTS.md.

30-Day Stats

Over the past 30 days, I've clocked in a pretty large amount of token usage, although 90-95% of those are cached tokens. I subscribe to ChatGPT Pro, and even at relatively high token usage, I have not been hitting any usage limits (the $200/mo is worth it). The Claude models I use via Bedrock and Vertex.

Provider

Tokens/mo

API Cost/mo

Claude API

1.3B

$1,211

OpenAI API

7.6B

$2,153

What are these tokens used for? Over the past month, I've had multiple projects baking, including one optimizing GPU kernels, a new custom model training framework, multiple new evals, papers and blog posts and grant proposals, several training research plans, and also, most recently, two new 30K+ LOC greenfield projects.

— Leonard

Paul Willot: Claude Code Setup and Planning-Execution-Expansion Workflow Loop

Published in ACW #9 on Jan 19, 2026.

This week's workflow is from Paul Willot, Senior Machine Learning Engineer at Liquid AI. Over to Paul:

I'm very much a terminal guy, so I focus on using CLI tools in a way that maintains control over the output.

My Tool Stack

I primarily use Claude Code in the terminal for actual coding, complemented by ChatGPT in the browser for research and planning. Occasionally I'll use Gemini for drafting UI parts since I've found it much better at spatial understanding.

For quick read-only exploration of a new codebase, I like using Gemini 3 Flash which has an impressive price/speed/quality balance. Used to call it through aider but switched to charmbracelet/crush in the last few months.

Claude Code Setup

I keep Claude Code largely vanilla, using a few MCP servers sporadically. I'm not currently using Agent Skills. Tried it when it was first introduced but found it hard to robustly control model behavior. Instead, I prepare a Makefile with self-documented targets, which are runnable by both agents and humans. It allows me to precisely control how the model runs and tests the codebase. I also pre-approve most tools to Claude and restrict access to specific files using standard file system permissions.

I mostly draft the CLAUDE.md for projects manually, but when I let it be auto-generated I review and trim it down cause LLMs like to be too exhaustive and include irrelevant details, which hurts focus down the line.

New Project Workflow

  1. Planning: I start a ChatGPT deep research session to gather information and tools around the problem I'm tackling. Simultaneously, I also set up a skeleton repo with basic structure and environment setup. This GitHub repo shows my typical project structure for a Python project.

  2. Execution: I use the deep research results to manually fill an initial TODO.md, then ask Claude to tackle the steps in order. I aim for the smallest possible MVP and review code and results frequently. I constrain the model fairly heavily in how to go about implementation at this stage.

  3. Expansion: Once happy with the initial shape, I add tests and setup tooling for the agent, like playwright-mcp for projects with web UI parts. Then I add more ambitious tasks to the TODO.md, reset context, and let the agent run longer with less hand-holding.

When Things Go Wrong

When the model is stuck or things don't go as I like, instead of adding more instructions I often reset context and have the model work on a sub-problem first, then integrate that solution into the broader project. If that's still not working... I just do it myself!

— Paul

Christoph Nakazawa: Fire-and-Forget Approach to Agentic Coding

Published in ACW #8 on Jan 12, 2026.

This week's workflow is from Christoph Nakazawa, CEO at Nakazawa Tech.

The Fire-and-Forget Approach to Agentic Coding

Christoph primarily uses ChatGPT or Codex web. He's tried Claude Code and some open models, but never vibed with any of them. He prefers to stay in control, not a fan of terminal agents touching files on his computer. He wants agents to work in their own environment and send him the results for review.

The killer feature of Codex on web is the ability to specify how many versions of a solution you want. When I was building fate, I asked for 4 versions for anything slightly complex. As a result, I barely trust single-response LLMs anymore.

Christoph

The approach is fire and forget. He doesn't always know what a solution should look like, and he's not a fan of specs. Code wins arguments. He fires off a prompt, looks at the results, and if it's not right or doesn't work, starts a fresh chat with a refined prompt. Planning happens in a regular ChatGPT chat with a non-Codex Thinking or Pro model.

He typically fires off 3-5 Codex sessions before stepping away from the computer or going to sleep, then works through each solution one by one. When done with a task, he archives all the chats and moves on.

A few things that he finds useful: turning memory off and starting a fresh chat for every question. Keeping a strict project structure so the model has rails to follow, then heavily editing and rewriting anything it generates before it touches the repo. Getting good at prompting a specific model, as the same prompt doesn't work as well with other LLMs. Building intuition for what it's good at and adjusting when a task takes more (or less) time than expected. Finally, treating the prompt input bar as a replacement for the todo list. Instead of filing a task, he drops the thought into a prompt. The change ships in an hour instead of sitting in a backlog forever.

Eunsu Jang: Parallel Codex Agents with Git Worktrees

Published in ACW #7 on Jan 5, 2026.

This week's workflow is from Eunsu Jang, Lead AI Application Engineer at Explaza. Over to Eunsu:

Parallel Codex Agents with Git Worktrees

I use an OSS coding agent orchestrator called vibe-kanban and give it the task prompt. It creates separate git worktrees and runs independent Codex agents for each task.

If the task is genuinely hard or ambiguous, I’ll kick off multiple runs with slightly different instructions, or just switch the Codex model (for example GPT-5.2 vs GPT-5.1 Codex Max). This tends to surface different tradeoffs: one version might be safer and boring, another might be over-engineered, another might match the existing architecture better. Since each implementation is isolated in worktrees, it’s easy to compare approaches without branches getting tangled.

I noticed that these models often over-engineer the implementation. So I add following instructions in the project AGENTS.md file:

## GENERAL RULES:
- Avoid over-engineering
- Follow DRY/YAGNI
- Respect the project architecture
- Fail fast. No invisible fallbacks.

Once the agents finish, I deploy each worktree and test in the browser. Anything that fails basic self-QA gets dropped early. For the ones that look promising, I open a separate Codex session and have it score and compare them. The score is subjective, but it’s useful for forcing a clear decision.

Here’s the exact comparison prompt I use:

Worktree A and Worktree B are results of the task below. Please evaluate each on a 10-point scale based on:

1. No over-engineering
2. Proper use of DRY/YAGNI
3. Strict respect for the project architecture
4. Fail fast, no invisible fallbacks

Choose the better worktree and explain why.

Original task instruction:
...

Finally, I select the winning worktree, issue small follow-up instructions for final polish, and create the PR.

— Eunsu

Reply

Avatar

or to participate

Keep Reading