Reviews & Comparisons

Scaling Code Review with AI: Cloudflare's Multi-Agent Orchestration

2026-05-03 23:15:42

Introduction

Code review is a cornerstone of modern software development, catching bugs early and spreading knowledge across teams. Yet it can also become a bottleneck, with merge requests languishing in queues as reviewers struggle to context-switch. At Cloudflare, the median wait for a first review often stretched into hours. To address this, we built an AI-powered code review system that uses a coordinated team of specialized agents, dramatically reducing review times while maintaining high quality. This article details our journey from experimentation to production, sharing the architecture and lessons learned.

Scaling Code Review with AI: Cloudflare's Multi-Agent Orchestration
Source: blog.cloudflare.com

The Problem with Traditional Code Review

Merge requests can stall for many reasons: reviewer availability, cognitive load from context-switching, and an endless cycle of nitpicks and revisions. We saw this firsthand across thousands of internal projects. While automated tools like linters help, they only catch surface-level issues. We needed something that could understand code semantics, flag real bugs, and scale across our diverse codebases.

Early Attempts: From Off-the-Shelf Tools to Naive Prompts

Our first step was evaluating existing AI code review tools. Many worked well and offered customization, but none provided the flexibility needed for an organization of Cloudflare's size. So we pivoted to a DIY approach: feeding git diffs into a large language model with a generic prompt. The results were noisy—vague suggestions, hallucinated syntax errors, and irrelevant advice like “consider adding error handling” on functions that already had it. Clearly, a naive approach wouldn't work for complex codebases.

The Solution: Multi-Agent Orchestration

Instead of building a monolithic reviewer, we created a CI-native orchestration system atop OpenCode, an open-source coding agent. Now, when a Cloudflare engineer opens a merge request, it gets an initial pass from a coordinated team of up to seven specialized AI agents:

How the Coordinator Works

The coordinator agent is the linchpin. It collects outputs from all specialists, removes duplicates, evaluates the true severity of each issue (e.g., blocking vs. advisory), and compiles a single, readable comment. This prevents the noise of multiple overlapping suggestions and gives engineers a clear action list. The system can automatically approve clean code, flag real bugs, and even block merges when it detects serious problems or security vulnerabilities.

Scaling Code Review with AI: Cloudflare's Multi-Agent Orchestration
Source: blog.cloudflare.com

Results and Impact

We've run this system internally across tens of thousands of merge requests. Key outcomes include:

This system is part of our broader Code Orange: Fail Small initiative, aimed at improving engineering resiliency.

Architecture Deep Dive

Building an LLM-powered system at the heart of CI/CD presented unique challenges. We had to handle model latency, API failures, and varying output formats. Our architecture uses a plugin-based design: each specialist is a modular plugin with a specific prompt and context. The coordinator uses a lightweight LLM call to merge results. This modularity lets us add or swap agents without rebuilding the whole system. We also implemented guardrails to prevent the system from becoming a blocker—for example, if the coordinator times out, the review defaults to a human-friendly summary.

Lessons Learned

We discovered that:

  1. Specialization beats generalization. A single model with a massive prompt produced worse results than multiple targeted models.
  2. Deduplication is critical. Without it, engineers would ignore the output as noise.
  3. Severity estimation requires careful tuning. Overly aggressive blocking erodes trust.
  4. The system must be fast. Engineers won't wait minutes for an AI review during a hotfix.

Conclusion

AI-assisted code review can be both scalable and reliable when built as an orchestration of specialized agents rather than a monolithic black box. At Cloudflare, this system has cut review wait times, caught real bugs, and become a trusted part of our development workflow. We're excited to continue refining it and sharing our findings with the community.

For more details, see our initial challenges or jump to the architecture discussion.

Explore

Apple Unveils Q2 2026 Revenue Guidance: 14-17% Growth Amid Supply Constraints Crypto Market Digest: Bitcoin Holds Steady at $87K, Altcoins Fluctuate as The White Whale Surges 15x in a Week macOS 27: What to Expect at WWDC 2026 and Beyond How to Harness Coffee's Hidden Power for Gut Health and Mental Clarity Meta's AI Acquisition Fuels Controversial 'Easy Money' Advertising Campaign