The Systematic AI Code Review Workflow: Plan, Generate, Validate
A practical guide to maintain quality at AI speed (6 min)
Once upon a time, a developer spent an afternoon building a user registration API with AI assistance.
API endpoints, proper validation, clean error handling—everything compiled perfectly and worked flawlessly in staging.
The PR was approved and merged into production.
Hours later, the security team sent an urgent Slack message.
The newly introduced endpoint was vulnerable to SQL injection through unvalidated query parameters.
One line of code. One blind spot.
One security vulnerability that an automated review would have caught instantly.
That’s the current reality of AI and Vibe coding without a systematic workflow:
It’s not a productivity, it’s a technical debt at AI speed.
To mitigate this, we need a new system to handle the massive code generation facilitated by AI.
In today’s article, I’ll share a systematic AI code review workflow to adapt to the new AI and Vibe coding era.
Why You Need a Loop
Most developers follow this pattern:
Generate code with AI
Push to production
Hope for the best
The problem with this approach is that AI optimizes for “working”, not “correct” code.
It writes code that compiles and runs, but might have security vulnerabilities, performance issues, edge case failures, and inconsistent error handling.
Imagine the following scenario: You ask your LLM to generate an Express API endpoint for user registration. The code works perfectly:
app.post(’/api/register’, async (req, res) => {
const { email, password, username } = req.body;
const user = await db.query(
`INSERT INTO users (email, password, username) VALUES (’${email}’, ‘${password}’, ‘${username}’) RETURNING *`
);
res.json({ success: true, user });
});However, there are a few major problems. SQL injection vulnerability. No password hashing. No input validation. No error handling, etc. You get the point.
The main point is that the code “works”, but it’s a security nightmare.
This is where the three-phase loop comes in:
Plan → Generate → Review → Ship
↑ ↓
└── Fix ─────┘The key principle is:
Separate generation from validation.
You might use:
Generation AI, like Cursor, Claude, or Copilot, optimizes for speed and functionality.
Review AI, like CodeRabbit, optimizes for security, performance, and quality.
You act as an orchestrator of both.
This way, each phase catches different types of issues.
Let’s dive in.
Phase 1: Plan Before You Generate
Good AI output starts with good input.
Planning before you generate any code creates better results.
Why planning matters:
clear specs = better AI output
prevents “works but unmaintainable” code
creates a checklist for validation
saves hours of debugging later
You might use Cursor Plan mode, or any other LLM like Claude or ChatGPT, to come up with a detailed action plan.
I personally try to fill out the following template before moving to the code generation:
Feature: [Name]
Purpose: [One sentence]
Inputs: [List with types and validation rules]
Outputs: [Success and error cases]
Edge cases: [What could go wrong?]
Security considerations: [What must be protected?]Feel free to edit it based on your current context, problem, and task.
This approach helps me be more confident that I’ve outlined all the important requirements and considerations before moving to the next phase.
If we follow the example from above related to the user registration endpoint for an Express API, we might prompt the following:
I need a user registration endpoint for an Express API. Users provide email, password, and username. Help me design the endpoint that handles validation errors, duplicate emails, and weak passwords. What else should I consider for security, validation, and error handling?Based on the given output, summarize the findings into the above-mentioned template, so you can share it in the prompt in a structured format when starting to generate code.
Phase 2: Generate With Context
Once you’ve gone through the planning and have a structured and summarized plan, you’re ready to generate.
Since you already defined a plan, you can create focused and high-quality prompts.
Generation best practices:
Generate one endpoint, component, or feature at a time. Don’t generate the entire service. The smaller and more focused the task, the better the outcome.
Use separate prompts/chats per task. Maintain the context window. Keep it as small as possible. This also reduces the costs.
Copy relevant sections from your plan directly into the prompt.
Iterate 2-3 times rather than expecting perfection on the first try.
Specify your exact tech stack, code style, and structure. Provide examples from the codebase if relevant.
AI generation is a conversation, not a one-shot command!
During the code generation phase, you want to manually check:
Does it follow your tech stack conventions?
Are the types and interfaces correct?
Does it match your project structure?
Are dependencies the ones you actually use?
If something looks off, create a new chat and refine your prompt with more specific guidance.
In the LLM world, if the first prompt is not okay, it’s hard to change it later.
Better create a new chat with a clean context and an improved prompt.
Coming back to the example from above related to the user registration endpoint for an Express API, we might prompt the following:
Create an Express TypeScript POST endpoint at /api/register for user registration.
Requirements:
- Accept email (string), password (string), username (string) in request body
- Validate email format (RFC 5322)
- Validate password: minimum 8 characters with at least one number
- Validate username: 3-20 alphanumeric characters only
- Hash password with bcrypt (12 rounds) before storing
- Use parameterized database queries to prevent SQL injection
- Handle duplicate email (return 409 with generic message)
- Handle validation errors (return 400 with field-specific errors)
- Handle database errors gracefully
- Return 201 with user object (exclude password) on success
Use Zod for input validation and proper TypeScript types throughout.Even though we’ve outlined a detailed plan and have a solid code, we might still miss important stuff.
For example, “Are there any missed edge cases?”, “Performance implications?”, “Is the validation comprehensive enough?”, etc.
This is exactly why we need the review phase.
We want to get a higher confidence in what we ship and double-check that we haven’t missed important considerations.
Phase 3: Review and Validate
We have a working code. Now, it’s time to validate it with a second AI perspective specialized for review.
Why bother reviewing AI-generated code?
AI generators have blind spots. They optimize for “working”, not “correct” code.
Different AI models catch different issues and blind spots.
Automated reviews find problems in seconds.
Ensures consistency across your codebase.
An example of such an AI code review tool is CodeRabbit.
What CodeRabbit catches:
Security: SQL injections, XSS vulnerabilities, exposed secrets, etc.
Bugs: race conditions, incorrect error handling, etc.
Performance: missing db indexes, N+1 queries, etc.
Best practices: inconsistent error handling, missing type safety, etc.
Put simply, CodeRabbit complements human reviewers, adding a lot of context based on your repo and general engineering practices.
📌 TL;DR
This systematic AI code review workflow isn’t optional; it’s how you maintain quality at AI speed:
Plan before you generate (upfront thinking saves hours).
Generate with context (good prompts = good code).
Review with specialized AI (e.g., CodeRabbit catches what generators miss).
Hope this was helpful.
See you next time! 🙌
P.S. If you’re into Web Development, I recommend you subscribe to Marko Denic’s newsletter, which is full of many tips and tricks.
👋 Let’s connect
You can find me on LinkedIn, Twitter(X), Bluesky, or Threads.
I share daily practical tips to level up your skills and become a better engineer.
Thank you for being a great supporter, reader, and for your help in growing to 29.3K+ subscribers this week 🙏

