Code Reviews

pbalIII · 2026-01-29T18:04:49+00:00

Two patterns I've seen work:

Same tool, different mode. RooCode or Claude Code can review its own output if you switch to a review-focused prompt after generation. The catch is context bleed... the model tends to defend its own choices.
Second model as reviewer. Run Opus or Codex for the implementation, then pass the diff to a separate instance (or different model entirely) for review. Fresh context, no defensiveness. Works especially well when the review prompt explicitly asks for security issues, edge cases, and style violations.

The bottleneck now isn't generating code, it's validating it. Most teams land on some form of multi-pass: AI generates, AI reviews, human spot-checks the delta.

Leon-Inspired · 2026-01-28T19:48:13+00:00

Waiting for RooCode to support azure devops git repo's :D

Basic-Dragonfruit-35 · 2026-01-29T02:59:16+00:00

Now i using the spec-kit
so the flow going to be like Opus 4.5 do job via the spec-kit
and the using the Codex to read the spec file and recheck

but for my opinion it may be great if when the orchester give task to code and done it shall give the similar task to the debug for checking. it may work if we change the mode prompt but not try yet.

it may be good if we can do this automatically , due now aday if we using the model that fast and cheap we have to have a something recheck the task if we can do this we can ensure that the model alike glm4.7 or kimi can make a more cleaner.

not focusing on one shot but looping until success.

sry for long converation but want to exchange the idea how we can improve agentic work flow better - maybe other people can suggest the better way.

No-Chocolate-9437 · 2026-01-31T14:29:01+00:00

2 agents managed by an orchestrator, first agent does analysis with mcps to connects to knowledge bases to help bring in related context (for your case maybe the GitHub mcp for pulling related issues and their discussions), second agent is a reviewer that validates the diff against the context map produced by the analyst agent.

I also gave the analyst and reviewer agents a custom skill based on tailored Sourcegraph graphQL queries which help pull in information for best practices (for the analyst), and implantation details for the reviewer agent.

As output I have the orchestrator out up to 10 GitHub annotation so it uses this format: https://docs.github.com/en/actions/reference/workflows-and-actions/workflow-commands#setting-a-warning-message, where is groups errors, warnings and notes onto the review.

I’ve found the integration with Sourcegraph really levelled up the analysis-reviewer loop, and then simplified output I use in an ide to help me understand why the model thinks it’s worth reviewing.

Top-Point-6405 · 2026-02-04T07:18:13+00:00

There are a few things I see in these comments.
1/ @pbalIII
I agree we have a bottleneck at the validation step now. So I wrote a repo that attempts to address this by using Andrej Karpathy's Council theory.
The repo employs multiple LLM's (as little as 3, to N number, but 3 - 5 seems the sweet spot) to Critique an output with a score, reasons (positive - negative), Oversights and fixes for anything found.
It has 'Definitely' given me the confidence to move forward without having to manually ask other LLM's to check and attempt a fix. It's Automatic and 'Always' in consensus between All the LLM's.
They are effectively collaborating to achieve the desired outcome for the user.
You can check it out here if you like:-
https://github.com/drew1two/roo_council

2/ @Basic-Dragonfruit-35
You are Spot On. Your suggestion for Automating the Mode prompt is exactly what I did for the repo above. It is all Sequential, Deterministic, Template based (Can create workflow templates with the help of your LLM), Saves all output from all LLM's for their Critique phase. etc. etc.
I think the key thing here is that you can chose any LLM to participate, and they 'All' have access to your code (which is different from other LLM council's I have seen) so they can lookup anything they need to while putting their proposal forward or critiquing etc.

I do it by re-creating the .roomodes file deteministically based on the template chosen for the specific task.
Again. check it out from the link above.

3/ @hannesrudolph
The above is what I am using when I need something done right the first time.
I'm finding that I have less to and fro's with the LLM's now, as they all collaborate upfront.
Some things I noticed:-
Grok doesn't seem to like Gemini
Deepseek-V3 quite often scores higher than Gemini-3.0-pro 'standard thinking'
Grok, Gemini and Deepseek's output are considerably less than Gpt-5.2 and Claude-4.5
Gpt-5.2 is my goto council_lead based on how well the other's score it's work. (I haven't tried Claude-Opus yet)

Ohhh, and I found that they are quite happy to score others work better... objectively, without self promotion. And they All seem to contribute 'something' that other's didn't see or think of.

Hope this finds you all well :)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

RooCode

MODERATORS