5 AI Development Workflows That Actually Work
5 proven AI coding workflows: parallel development, AI code review, TDD with Claude, CI diagnosis. Step-by-step guide with real examples.
ChatML Team
ChatML Team
Most writing about AI coding tools focuses on what the models can do. This post is about what you can do with them. Five concrete workflows, each one something you can try today, each one designed to compress hours of work into minutes. No theory. No hand-waving about the future. Just the steps.
These are the ai coding workflows we use every day while building ChatML itself. They work because they solve specific, repeatable problems that engineers actually encounter --- not because they sound impressive in a demo.
| # | Workflow | What It Does |
|---|---|---|
| 1 | Parallel Feature Development | Run 3-5 AI agents simultaneously, each in its own git worktree |
| 2 | Plan Mode for Complex Changes | Agent plans before coding; you review the approach first |
| 3 | AI Code Review with Inline Comments | Line-by-line review at three depths: quick, deep, security |
| 4 | TDD with Claude | Write tests first, let the agent implement until green |
| 5 | CI Failure Diagnosis | Feed build failures to an agent for root cause and fix |
Workflow 1: Parallel Feature Development
This is the core use case, and it is the reason we built ChatML in the first place. If you have read the problem we are solving, you know that every AI coding tool today forces sequential work. One agent, one task, one directory. You wait for it to finish before you start the next thing.
Here is how parallel ai development actually works in practice.
The scenario. You have three features to build this sprint: JWT authentication for your API, a Stripe payment integration, and an email notification service. In a traditional AI-assisted workflow, you would tackle them one at a time. The auth refactor takes forty minutes of agent time. The Stripe integration takes thirty. The email service takes twenty. That is ninety minutes of wall-clock time, most of which you spend waiting.
The parallel workflow, step by step.
Step 1: Open ChatML and create three sessions. Each session appears as a tab in the interface. Give them clear names: "JWT Auth," "Stripe Payments," "Email Notifications." Naming matters because you will be switching between them.
Step 2: Each session gets its own worktree and branch. When you create a session, ChatML automatically runs git worktree add behind the scenes. It creates a new branch from your current HEAD and checks it out in an isolated directory. Three sessions means three branches, three working directories, zero filesystem conflicts. If you want to understand the mechanics of how this works, we wrote a deep dive on git worktrees.
Step 3: Give each agent its task. In the JWT Auth session, type: "Implement JWT authentication. Replace the existing session-based auth in middleware.ts with JWT verification. Add token generation to the login endpoint. Update the user model to store refresh tokens." In the Stripe session: "Build a Stripe payment integration. Create a new payments module with endpoints for creating checkout sessions, handling webhooks, and managing subscriptions. Use the existing Stripe SDK in package.json." In the Email session: "Add an email notification service using Resend. Implement welcome emails on signup, password reset emails, and payment confirmation emails. Use the templates in /src/templates."
Be specific. The more context you give upfront, the less you need to intervene later.
Step 4: All three agents start working simultaneously. This is the part that feels different. You are not waiting. All three agents are reading files, planning their approach, and writing code at the same time. In the ChatML interface, you can see live diffs streaming in from each session. Files being created, modified, deleted --- all visible in real time.
Step 5: Monitor and intervene when needed. Switch between session tabs to check progress. Maybe the Stripe agent is implementing webhooks using Express middleware, but your project uses Hono. Jump into that session: "We use Hono, not Express. Check server.ts for the existing route setup." The agent adjusts. You switch back to monitoring the other two.
Step 6: Review each PR independently. The email notification agent finishes first. You review its diff --- clean, focused, only touches the files it should. You approve it and create a PR. The JWT agent finishes next. Its diff is larger and more consequential, so you spend a few extra minutes reviewing the middleware changes carefully. Approve, PR. The Stripe agent finishes last. Review, approve, PR.
The result. Three features, three clean PRs, roughly forty minutes of wall-clock time instead of ninety. You were not idle during any of it. You were reviewing, redirecting, and approving. That is parallel ai development: you orchestrate, the agents execute.
ChatML launched as v0.1.0 with full support for this workflow out of the box.
Workflow 2: Plan Mode for Complex Changes
Not every task should start with "just do it." For large refactors, database migrations, or architectural changes, you want to see the plan before the agent starts writing code. This is where plan mode earns its keep.
The scenario. You need to migrate your API from REST to GraphQL. This touches dozens of files: route definitions, controllers, middleware, client-side fetch calls, types, tests. Telling an agent "migrate our API to GraphQL" and letting it loose is how you end up with forty-seven modified files and a codebase that does not compile.
The plan mode workflow.
Step 1: Describe the goal, not the implementation. Open a session and explain what you want: "I want to migrate our user-facing API from REST to GraphQL using Apollo Server. We need to preserve all existing functionality. The REST endpoints should remain available during the transition with a deprecation flag. I want a schema-first approach with code generation for TypeScript types."
Step 2: The agent explores before it acts. In plan mode, the agent reads your codebase systematically. It maps out the existing route structure, identifies which controllers handle which endpoints, traces data flow through middleware, and catalogs the types involved. It is building a mental model of what exists before proposing what should change.
Step 3: It produces a detailed plan. The output is not code --- it is a structured plan. Which files need to change, in what order, what each change looks like at a high level. Something like:
- Create GraphQL schema file at
src/graphql/schema.graphqlwith User, Post, and Comment types - Generate TypeScript types from schema into
src/graphql/generated/ - Create resolvers for each type in
src/graphql/resolvers/ - Add Apollo Server middleware to
src/server.ts - Add deprecation headers to existing REST routes in
src/routes/ - Update client-side API calls in
src/client/api/to use GraphQL queries - Update integration tests in
tests/api/to cover both REST and GraphQL - Estimated files modified: 23. Estimated files created: 11.
Step 4: You review and adjust. Maybe you want to skip the client migration for now and keep the REST endpoints without deprecation flags until the GraphQL layer is proven. Tell the agent. It revises the plan. You go back and forth until the plan matches what you actually want.
Step 5: Execute. Only after you approve the plan does the agent start writing code. And because it has a plan, it works methodically --- creating files in the right order, respecting dependencies, running tests after each major step.
Plan mode prevents the most common failure mode of AI-assisted development: the agent makes sweeping changes that are technically impressive but strategically wrong. For database schema migrations, API versioning, major dependency upgrades, or any change that touches more than ten files, plan first.
Workflow 3: AI Code Review with Inline Comments
Code review is the workflow where AI adds the most value with the least risk. The agent is not writing production code --- it is reading code that already exists and flagging potential issues. The worst case is a false positive that you ignore.
The scenario. A teammate (or one of your AI agents from Workflow 1) has opened a PR with three hundred lines of changes across eight files. You need to review it, but you also have your own work to do.
The review workflow.
Step 1: Open a review session in ChatML. Point it at the PR branch. The agent loads the entire diff and the surrounding context --- not just the changed lines, but the files they live in, the imports they reference, the tests that cover them.
Step 2: The agent reads and comments. It goes through the diff file by file and leaves inline comments directly on the relevant lines. These are not generic "consider adding a comment here" suggestions. They are specific, contextual observations:
- Error on
payments.ts:47: "This catches the Stripe error but swallows the stack trace. The error logging on line 52 only logserror.message, which loses the context needed for debugging webhook failures. Pass the full error object to the logger." - Warning on
auth.ts:112: "The JWT expiration is set to 30 days. Your existing session tokens expire after 24 hours. This is a significant security change that should be called out in the PR description." - Suggestion on
users.ts:89: "This database query runs inside the request handler without a transaction. If the update on line 94 fails after the insert on line 89 succeeds, you'll have orphaned records. Wrap lines 89-96 in a transaction." - Info on
tests/payments.test.ts:23: "This test mocks the Stripe client at the module level, which means it won't catch issues with how the real client is initialized. Consider adding one integration test that uses a Stripe test key."
Step 3: Review the comments in the diff view. ChatML shows comments inline, categorized by severity --- errors, warnings, suggestions, info. You can filter by severity to focus on the critical issues first. Each comment includes enough context that you can evaluate it without switching to your editor.
Step 4: The agent suggests fixes, not just problems. For each issue it flags, the agent can also propose a concrete fix. Not just "this is wrong" but "here is what the corrected code looks like." You can apply the suggestion directly or use it as a starting point for your own fix.
This catches the class of bugs that linters and type checkers miss: logic errors, race conditions, missing edge cases, security implications of configuration changes, inconsistencies between the PR and the existing codebase. It does not replace human review --- you still need to evaluate whether the changes make sense at a product level. But it handles the mechanical part of reviewing correctness, freeing you to focus on design and intent.
Workflow 4: TDD with Claude
Test-driven development has always been a discipline that engineers respect in theory and skip in practice. The friction is real: writing tests before implementation is slow, and the feedback loop --- write test, watch it fail, write code, watch it pass --- takes patience. AI changes the economics of that loop completely.
The scenario (direction 1: you write tests, the agent implements). You are building a rate limiter. You know exactly what the behavior should be: allow N requests per window, return 429 after the limit, reset after the window expires, support different limits per endpoint.
Step 1: Write the tests. You write a test file that describes the behavior you want. You do not write any implementation code.
describe('RateLimiter', () => {
it('should allow requests under the limit', () => {
const limiter = new RateLimiter({ maxRequests: 10, windowMs: 60000 });
for (let i = 0; i < 10; i++) {
expect(limiter.check('user-1')).toBe(true);
}
});
it('should reject requests over the limit', () => {
const limiter = new RateLimiter({ maxRequests: 10, windowMs: 60000 });
for (let i = 0; i < 10; i++) limiter.check('user-1');
expect(limiter.check('user-1')).toBe(false);
});
it('should track limits per key independently', () => {
const limiter = new RateLimiter({ maxRequests: 1, windowMs: 60000 });
expect(limiter.check('user-1')).toBe(true);
expect(limiter.check('user-2')).toBe(true);
expect(limiter.check('user-1')).toBe(false);
});
it('should reset after the window expires', async () => {
const limiter = new RateLimiter({ maxRequests: 1, windowMs: 100 });
expect(limiter.check('user-1')).toBe(true);
expect(limiter.check('user-1')).toBe(false);
await new Promise(resolve => setTimeout(resolve, 150));
expect(limiter.check('user-1')).toBe(true);
});
});Step 2: Hand it to the agent. Tell the agent: "Implement the RateLimiter class to make all these tests pass. Use a sliding window algorithm. Store state in memory using a Map."
Step 3: The agent implements and iterates. The agent writes the implementation, runs the tests, sees which ones fail, adjusts, and runs again. Using the skills system, the agent can execute test commands automatically and iterate without your involvement. You come back to a green test suite and a clean implementation to review.
The inverse scenario (direction 2: the agent writes tests for existing code). You have an existing module with zero test coverage. Hand it to the agent: "Write comprehensive tests for src/lib/rate-limiter.ts. Cover edge cases: concurrent access, zero-length windows, negative limits, very large request counts. Use the project's existing test setup."
The agent reads the implementation, understands the interface, identifies edge cases that the implementation handles (and ones it does not), and produces a test suite. This is particularly valuable for legacy code where writing tests retroactively feels like a chore no one wants to do.
Why this ai tdd workflow works. The feedback loop --- write test, run test, fix code, repeat --- is exactly the kind of mechanical iteration that AI agents handle well. They are patient, they do not get frustrated by red tests, and they will try ten different approaches to make a test pass without complaining. You define the behavior through tests; the agent handles the implementation grind.
Workflow 5: CI Failure Diagnosis
Your CI pipeline fails. The build log is three hundred lines of output, most of it irrelevant. Somewhere in there is the actual failure: a flaky test, a missing environment variable, a dependency conflict, a type error that only manifests in the CI environment. Finding it takes ten minutes of scrolling and squinting. Fixing it takes two.
The diagnostic workflow.
Step 1: Feed the failure to the agent. Paste the CI failure URL, the relevant log output, or the error message into a ChatML session. Give the agent context: "This is the CI log from our main branch build. It was passing yesterday. Here are the commits since the last green build."
Step 2: The agent analyzes the failure. It reads the log output, identifies the actual error (not the first warning, not the noise --- the real failure), and cross-references it with your source code. If the error is in a test, it reads the test file and the code it tests. If the error is a build failure, it reads the build configuration and the changed files.
Step 3: Root cause and suggested fix. The agent tells you what broke and why. Not just "test X failed" but "test X failed because the mock for the payment service was not updated to include the new refundAmount field added in commit abc123. The test creates a payment object without that field, which causes a runtime error on line 47 of payment-processor.ts where it's accessed without a null check."
Step 4: Spawn a fix session. Here is where the parallel workflow from Workflow 1 compounds. You can open a second session, tell the agent to fix the CI failure, and go back to whatever you were working on. The fix agent creates a worktree, makes the targeted change, runs the tests locally to verify, and creates a PR. You review it and merge.
This workflow is especially powerful for three categories of CI failures:
- Flaky tests --- where the failure is intermittent and hard to reproduce locally. The agent can analyze the test for timing dependencies, shared state, or order-dependent behavior.
- Environment-specific failures --- where the code works on your machine but fails in CI. The agent can compare CI configuration with local setup and identify discrepancies.
- Dependency issues --- where a transitive dependency update breaks something. The agent can trace the dependency graph and identify which update caused the regression.
Bonus: Keyboard-Driven Workflow
Everything described above is faster when you never reach for the mouse. ChatML is designed as a keyboard-first ai code review tool and development environment. Every action --- creating sessions, switching between them, approving diffs, managing worktrees, navigating comments --- has a keyboard shortcut.
Quick keys that matter.
- Create a new session without leaving your current one. One shortcut. You type the task, hit enter, and the agent starts.
- Switch between active sessions. Tab through them or jump to a specific one by number. You are monitoring three agents; you need to move fast.
- Approve or reject individual file changes in a diff. You are reviewing an agent's work; you want to accept most changes but reject one file. Keyboard.
- Expand and collapse diff hunks. When a diff is large, you need to focus on specific sections without scrolling through everything.
- Open the command palette for any action you do not have memorized.
The shortcuts are fully customizable. If you are a Vim user who has spent years building muscle memory for j, k, gg, and G, you can remap navigation to match. If you have your own keybinding conventions, you can set them up once and have them apply across all of ChatML's interfaces.
This matters because the workflows above --- parallel development, plan mode, code review, TDD, CI diagnosis --- all involve rapid context switching. You are jumping between sessions, reading diffs, typing instructions, reviewing comments. Every time you reach for the mouse, you break flow. A keyboard-driven interface keeps you in the zone.
Getting Started
These five workflows are not aspirational. If you want the full origin story of how we used these workflows to build ChatML itself -- 750+ pull requests, zero human-written code -- that's worth reading too.
They are things you can do right now. Download ChatML, open a project, and try the simplest one first: Workflow 3, code review. Point it at an open PR in your repository and see what it finds. It takes two minutes and the risk is zero --- you are not modifying any code, just reading it.
Once you have seen the review workflow in action, try Workflow 1: parallel development. Pick two small, independent tasks from your backlog. Create two sessions. Watch them run simultaneously. Review the diffs. Merge them.
That is the moment it clicks. Not because the AI is smarter than you --- it is not --- but because you are no longer waiting. You are orchestrating. And orchestrating three agents is faster than babysitting one.
Download ChatML and find out for yourself.
More from the blog
Want to try ChatML?
Download ChatML