How to Pinpoint the Culprit: A Step-by-Step Guide to Automated Failure Attribution in LLM Multi-Agent Systems

By • min read

Introduction

When an LLM multi-agent system fails, developers often face a daunting task: sifting through hundreds of lines of interaction logs to determine which agent caused the failure and at what point the mistake occurred. This manual debugging process is time-consuming, error-prone, and scales poorly as systems grow in complexity. To address this, researchers from Penn State University, Duke University, Google DeepMind, and other institutions introduced the problem of automated failure attribution and created the first dedicated benchmark dataset, Who&When, accepted as a Spotlight presentation at ICML 2025. This guide translates their research into a practical, step-by-step workflow you can follow to implement automated failure attribution in your own multi-agent systems.

How to Pinpoint the Culprit: A Step-by-Step Guide to Automated Failure Attribution in LLM Multi-Agent Systems
Source: syncedreview.com

What You Need

Before you begin, ensure you have the following:

Step-by-Step Instructions

Step 1: Understand the Failure Attribution Task

Automated failure attribution asks two questions per failed task: “Which agent?” and “When?” (i.e., at which step in the interaction). The Who&When dataset formalizes this as a benchmark. Familiarize yourself with the dataset’s structure: each failure case includes a task description, a full interaction log, and ground-truth labels for the responsible agent and the step index. This will guide your approach.

Step 2: Collect and Preprocess Your System’s Logs

If you are working with your own system, extract logs in a consistent format. For each task attempt that ended in failure (e.g., incorrect final output, loop, timeout), assemble:

Normalize logs into JSON lines with fields: `task`, `messages` (list of dicts with `agent`, `step`, `text`), and `failure_type`. If using Who&When, download the dataset and load it directly.

Step 3: Choose an Attribution Method

The researchers explored several methods. For simplicity, start with Direct Prompting – feed the entire log to an LLM and ask it to identify the culpable agent and step. Example prompt:

“Given this multi-agent conversation that resulted in a failure, which agent made the critical mistake, and at which step (0-indexed)? Output in JSON: {"agent": "...", "step": integer}.”

More advanced options include contrastive prompting (compare with successful runs) and agent-level vs. step-level decomposition. The Who&When paper provides baselines – you can replicate them using their open-source code.

Step 4: Run Attribution on Your Logs

Implement a script that loops over each failure case, builds the prompt, calls the LLM API, and parses the response. For the Who&When dataset, compare your predictions against the ground-truth labels to compute accuracy (agent attribution, step attribution, and joint accuracy). For your own logs, you may need to manually verify a sample to establish a baseline.

Step 5: Analyze and Iterate

When attribution fails, examine misclassifications. Common pitfalls include:

Adjust your prompts or method accordingly. The Who&When benchmark is designed to help you compare different approaches systematically.

Tips

The following insights from the original research can improve your success rate:

Recommended

Discover More

Lenovo Yoga Slim 7i Aura Edition (2026) Breaks Cover: Stunning OLED Display, Featherlight Build, and Record Battery LifeMIT Unveils Virtual Violin That Simulates Acoustic Physics to Aid LuthiersFedora Workstation 44: 8 Key Highlights You Should KnowUS Agency Expands Pre-Release AI Safety Testing to Include Major Tech FirmsUnearthing a Golden Relic: The 1,500-Year-Old Scabbard Ornament Found in Norway