How to Build a Virtual Agent Fleet for Automated Testing and Triage

By • min read

Introduction

Imagine a team of AI agents that autonomously test your product, triage issues, post release notes, and even fix bugs—all running in CI without human intervention. That's exactly what the Coding Agent Sandboxes (sbx) team at Docker accomplished with their 'Fleet'. This how-to guide walks you through creating your own virtual agent team using the same principles: role-based skills (not scripts), local-first development, and seamless CI integration. By the end, you'll have a replicable process to ship faster with AI agent autonomy.

How to Build a Virtual Agent Fleet for Automated Testing and Triage
Source: www.docker.com

What You Need

Before you start building your fleet, gather these prerequisites:

Step-by-Step Guide

Step 1: Define Your Agent Roles and Responsibilities

Start by listing the tasks you want your fleet to handle autonomously. For example:

Each role must have a clear persona—think of a human colleague with specific expertise. Role descriptions should emphasize decision-making, not just execution.

Step 2: Create a Skill File for Each Role

Skill files are markdown files that define:

Critically, a skill is not a script. It guides the agent's judgment. For example, if a test fails unexpectedly, a script stops—but a role investigates. Write each skill to encourage the agent to explore, learn, and adapt.

Step 3: Test Each Skill Locally First

Never start by wiring a skill into CI. Instead, run it from your terminal using the sandbox CLI. For example, invoke sbx run /cli-tester on your laptop. Watch the agent think—observe where it gets confused, where it succeeds, and whether it follows the intended logic. Tweak the skill file (markdown) and re-invoke immediately. This local-first approach turns iteration cycles from minutes (commit-push-wait-read-logs) into seconds (edit-file-run).

During local testing, verify:

Only promote a skill to CI once it consistently produces reliable results on your machine.

How to Build a Virtual Agent Fleet for Automated Testing and Triage
Source: www.docker.com

Step 4: Set Up CI Workflows for Each Role

Now wire your skill into CI. The key principle: CI is just another runtime for the same skill file. Do not create a separate CI version. Your workflow should simply:

  1. Set up the environment (checkout code, install sandbox CLI, configure platform-specific dependencies).
  2. Invoke the skill exactly as you did locally (e.g., sbx run /cli-tester).
  3. Collect results (logs, generated reports, issue links).
  4. For the /cli-tester example, Docker runs it nightly on macOS, Linux, and Windows runners—all using the exact same skill file. The workflow does not add any custom logic. This ensures consistency across runtimes and eliminates translation errors.

    Step 5: Integrate and Iterate

    Once individual skills run in CI, chain them into a fleet. For instance:

    • The build engineer skill runs first, producing binaries.
    • The exploratory tester skill then exercises those binaries on each platform.
    • If tests fail, an issue triager skill can automatically classify and assign the bug.
    • A release note skill runs after every successful release.

    Monitor the fleet's output and feedback loops. If an agent misbehaves (e.g., triages incorrectly or generates poor notes), revert to local mode, debug the skill file, and redeploy. Since all skills are runtime-agnostic, improvements propagate instantly to both local dev and CI.

    Tips for Success

    • Keep skills simple and focused. Don't overload one agent with too many personas. A dedicated tester skill will outperform a jack-of-all-trades.
    • Write skills as decision guides, not checklists. The power of AI agents is judgment. Let them decide how to test; you define what to test.
    • Invest in local debugging. The faster you can iterate on a skill file, the better your fleet will perform. Avoid the commit-push-wait cycle at all costs.
    • Use version control for skill files. Treat them like code—review changes, rollback bad tweaks, and document why a persona works.
    • Monitor agent behavior over time. As products evolve, your skills may need updates. Schedule periodic reviews of agent decisions to catch drift.
    • Start small. Build one or two critical roles first (e.g., tester and triager), then expand. A smaller, reliable fleet beats a large, brittle one.
    • Embrace failure as learning. When an agent makes a mistake, that's a chance to improve the skill file. The fleet learns not from training data but from human refinement of its role description.

    By following these steps, you can assemble a virtual agent team that ships faster, reduces manual toil, and gives you back time for creative engineering. The Docker sbx team proved that local-first, skill-based agents scale—and now you can too.

Recommended

Discover More

Building Trust in AI: A Practical Guide to Model Provenance with Cisco’s Open Source Toolkit5 Game-Changing AWS Updates from April 2026: AI Costs, Cybersecurity, Agent Orchestration, and StorageTank Pad Ultra: A Rugged Tablet with a Built-in 1080p Projector – Everything You Need to KnowSpotify and Anthropic Unveil Agentic Development Paradigm in Live TalkWindows 11 Pro at a Fraction of the Cost: What You Get for Just $10