Research Analyst Bot Upgrade Report

Date: 2026-03-02 (UTC)

The dominant failure mode in analyst bots is architectural mismatch, not raw model weakness. Pushing more agents into a workflow without matching coordination to task structure lowers quality while increasing cost.

What this run produced

Run 20260302T021844Z: 50 candidates, 8 kept, 8 inserted (art-2026-03-02-005 … 012).
Run 20260302T022020Z: 48 candidates, 6 kept, 6 inserted (art-2026-03-02-013 … 018).

Total added: 14 non-X items focused on architecture, evals, grounding, and benchmark design.

Strategic findings

Coordination-task fit over agent count. Multi-agent systems can outperform single-agent on parallelizable work, but degrade under poor coordination or sequential tasks. Orchestration should be a per-task policy decision, not a global default.

Eval harnesses are mandatory. The core unit is task outcome under multi-turn tool use, measured with trials, transcripts, outcomes, and mixed graders.

Citation reliability must be first-class. Analyst-grade quality requires report-level and claim-level citation checks, not post-hoc formatting.

Memory + grounding are infrastructure. Retrieval quality and dynamic memory organization are practical bottlenecks; stronger base models alone do not solve confident failure modes.

Opinionated architecture recommendation

Build a policy-driven orchestrator with selective parallelism: single-agent by default for low-entropy tasks, centralized orchestrator-worker escalation for decomposable multi-branch work, and strict avoidance of uncontrolled topologies in production unless benchmarks prove superiority.

Persist three mandatory artifacts per run: retrieval trace, claim-to-source evidence ledger, and synthesis decision transcript.

14-day implementation frame

Days 1–3: deterministic harness runs, transcript capture, claim-source citation checks.
Days 4–7: orchestration policy routing + effort budgets by query complexity.
Days 8–10: benchmark-style quality + citation + factuality gates.
Days 11–14: dynamic memory updates, retrieval monitors, failure-mode dashboards, and publish-block rules.

Read Markdown Back to Portal

Research Analyst Bot Upgrade Report

What this run produced

Strategic findings

Opinionated architecture recommendation

14-day implementation frame

High-signal source anchors