Building Governed AI Code Review Agents Across Your Engineering Org

This blog post is AI-Assisted Content: Written by humans with a helping hand.

This blog is featured in our AI Use Case Library of real AI solutions.

The Problem

A large K-12 education company had a problem most teams would envy. They had gone all in on Claude Code. The whole engineering team used it every day, pushed it hard and even had it onboarding new developers through Slack. It worked. Then the side effects showed up.

They already had review agents and a shared library of agent skills, and that success came with a cost. The tooling had grown one team at a time, so it was spread across their code with no shared standard. Each team’s agents had drifted into their own settings and instructions, and gaps had opened that nobody owned. The agents meant to keep everyone consistent had become inconsistent themselves. The team did not need more agents. They needed someone to clean up the ones they had, hold them to one standard and draw a clear line around what each one is responsible for.

A Model for Where the Agents Fit

Before rebuilding anything, we agreed on a simple model for how work moves through the team. Every change passes through three roles.

A Dev writes the change: That is still usually a person, but more and more it is an agent working from a scoped ticket.
An AI-Reviewer takes the first look: It catches the small stuff that does not need a human, like style slips, missing documentation and broken conventions.
A Human-Reviewer makes the judgment calls: A person still has to decide whether the change is correct, whether it serves the business and whether it fits the context a model cannot see.

The whole engagement organized around those three roles, and the rest of this story follows them. First the AI-Reviewer we standardized, then the skills library that spreads it, then the author agent that is starting to take on the Dev role.

The AI-Reviewer: One Standard, Made Dependable

Here is what a developer sees. They open a pull request the way they always have. Before a human gets to it, a Claude agent reads the change and leaves comments, the same way a colleague would. A review agent was nothing new to them. What we changed is that every team’s agent now reviews to the same standard, and explains itself when it does.

We rebuilt the mismatched review agents around one shared standard, then made that standard dependable enough to run every day. Four things mattered most.

Every comment says why: Each flag comes with a short reason, so the developer sees the thinking behind it. That sounds minor, but it decides whether a developer trusts the agent or quietly scrolls past it. Unexplained flags kill these tools faster than wrong ones do.
It takes the first look, not the last word: The agent clears the routine misses, so human reviewers can spend their attention on the calls only they can make.
It runs where the client already works: We could not put the agents on an outside service. The client’s security setup ruled it out, including its network restrictions and access controls. So the agents run inside the automation the client already uses on every pull request, within limits the security team already trusted. Each run also produces a clean, readable summary instead of raw logs. Meeting the team where its permissions already were beat fighting for new ones.
We fixed the plumbing, not just the wording: One bug had been canceling a review while it was still running, any time someone added a human reviewer partway through. Changing an agent’s instructions is easy. Keeping it alive when it runs into real traffic and real human habits is the hard part, and that is what decides whether the agent earns its place.

Above: This is from a similar, internal review agent at InterWorks

The Skills Library: How the Standard Spreads

A standard only helps if every team reaches for it. The client already had a shared library of agent skills. What it lacked was ownership. Nobody had settled who runs it, how skills get named or how different teams add to it without each rebuilding the same thing.

We put that structure in. We set a naming convention and made sure every skill had a named owner, then cleaned up what had piled up over time. We also built and moved over real skills people use. They include connector setup, utilities for data product managers, a documentation-coverage check, and a tool that pulls the right context out of a Jira ticket. The point was a library of working tools, not a place to park them.

This is how one team’s good setup becomes the whole company’s default. Developers reach for a shared, supported set of skills instead of rebuilding the same thing. Consistency becomes the default, not something anyone has to remember. A skill nobody owns is a skill nobody maintains, and that is how the drift comes back.

The Author Agent: From Ticket to Pull Request

The AI-Reviewer assumes a person wrote the change. More and more, the writer is also an agent. The same engagement built an agent that takes a scoped Jira ticket and opens a first draft of the pull request, which automates the path from ticket to review. This is well past the experiment stage. The agent is built, automated tests run on every change to it, and it is moving from a practice sandbox into real ticket work. We held it to the same bar as the reviewer.

We test it like real software: Before any change ships, it has to pass a full run-through against a practice copy of the ticket system, loaded with the same test cases every time. A change to the agent’s instructions gets the same scrutiny as a change to the code. Most teams treat those instructions as settings you can tweak live. At this scale they are code, and we treated them that way.
It checks its own work first: After the agent opens its draft, it compares its work against what the ticket asked for and against the team’s review standard. It fixes what it can and leaves anything it cannot resolve as a note for a person. Because the author agent and the AI-Reviewer share one standard, the pull request that reaches a human is already cleaned up from both sides.
It makes sure someone notices: A draft nobody sees is useless. The agent sends each one to the right team and posts a heads-up, so the human reviewer actually picks it up instead of waiting on a tag nobody reads.

Why It Matters

Two things changed. First, the routine problems get caught before a person spends time on them. Human reviewers walk into a pull request that is already cleaned up, and they spend their attention on the decisions that need a human.

Second, the shared standard let teams move faster. Instead of patching over standards that kept drifting apart, they put their time into the actual work those standards were there to support. This client reported that PR review times fell by nearly 70%. That’s more time spent on moving forward, not just reviewing.

Developer productivity has kept climbing through the engagement, and the team keeps asking for more. The credit belongs to both sides. We did not do this to their engineering org. We built it with their own tooling teams, who knew their code and their standards far better than we ever would, and they are the reason it stuck.

Where This Could Go

The review work is the first install, not the finish line. With the author agent already moving from review into drafting, a few more directions are on the table.

Documentation agents draft and update docs straight from the code, so documentation stops falling behind. A team could start this on a single project on its own.
A support-ticket drafter uses the code and docs to write replies to customer tickets. Each answer reflects what the system does, not what someone half-remembers.
Team-awareness agents keep everyone current on what the rest of the team is working on. That is a real shot at cutting meeting time instead of adding another status update.

The documentation work is something the client’s teams could try on their own. The rest are natural next engagements, and the shared skills library makes each one cheaper to build than the last.

Takeaway

The team’s problem was never that they were too slow. It was that moving fast left them with more code, more standards, and more agents than anyone could keep straight. The answer was not to slow down. It was to give every change the same three roles, standardize the agents that fill them, and make those agents dependable enough to trust and shared enough to scale. Done that way, the speed finally works for the code instead of against it.

SIGMA RAPID START

Building Governed AI Code Review Agents Across Your Engineering Org

Building Governed AI Code Review Agents Across Your Engineering Org

The Problem

A Model for Where the Agents Fit

The AI-Reviewer: One Standard, Made Dependable

The Skills Library: How the Standard Spreads

The Author Agent: From Ticket to Pull Request

Why It Matters

Where This Could Go

Takeaway

Related

How To Leverage AI To Become a World-Class Dev Team

Switching your analytics to Sigma?
Speed up success with a Rapid Start!

Derrick Austin

Building Governed AI Code Review Agents Across Your Engineering Org

Building Governed AI Code Review Agents Across Your Engineering Org

The Problem

A Model for Where the Agents Fit

The AI-Reviewer: One Standard, Made Dependable

The Skills Library: How the Standard Spreads

The Author Agent: From Ticket to Pull Request

Why It Matters

Where This Could Go

Takeaway

Related

How To Leverage AI To Become a World-Class Dev Team

Switching your analytics to Sigma? Speed up success with a Rapid Start!

Derrick Austin

Company Information

Tax Information

Payment Options

Other Information

Terms

Switching your analytics to Sigma?
Speed up success with a Rapid Start!