Sapho Chapterhouse Institute Kept Artifacts

Retained research artifacts with source and archive links.

57 kept · 57 decisioned · updated 2026-05-08T22:28:44Z

Published 2026-05-08 · Sapho Daily

Agent pull requests are everywhere. Here's how to review them.

Agent pull requests have already become a large operational surface in software development, but scale and reviewer comfort do not make them safe by default. The article argues that agent-written changes can raise maintenance burden, hide correctness failures behind green CI, an…

Sapho Chapterhouse Institute Kept Artifacts

Agent pull requests are everywhere. Here&#039;s how to review them.

Securing the git push pipeline: Responding to a critical remote code execution vulnerability

How exposed is your code? Find out in minutes—for free - The GitHub Blog

Selective Memory for Artificial Intelligence: Write-Time Gating with Hierarchical Archiving

Labor market impacts of AI: A new measure and early evidence

GitHub Copilot CLI combines model families for a second opinion

GitHub availability report: March 2026 - The GitHub Blog

Autonomous Evaluation and Refinement of Digital Agents

Kimi K2.5 Tech Blog: Visual Agentic Intelligence

How Do Agentic AI Systems Address Performance Optimizations? A BERTopic-Based Analysis of Pull Requests

From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

Autonomous Evaluation and Refinement of Digital Agents

ReliabilityBench: Evaluating LLM Agent Reliability Under Production-Like Stress Conditions

Agent frameworks do not yield a stable winner across code-centric software engineering tasks

Enhancing LLM Code Generation: A Systematic Evaluation of Multi-Agent Collaboration and Runtime Debugging for Improved Accuracy, Reliability, and Latency

ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development

COMPASS: Enhancing Agent Long-Horizon Reasoning with Evolving Context

SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios

How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests

Long-horizon agents: OpenCode + GPT-5.2 Codex Experiment - DEV Community

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving

Incident postmortem in the age of AI agents

On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub

Evaluating Very Long-Term Conversational Memory of LLM Agents

Code Change Characteristics and Description Alignment: A Comparative Study of Agentic versus Human Pull Requests

Security in the Age of AI Teammates: An Empirical Study of Agentic Pull Requests on GitHub

Let’s Make Every Pull Request Meaningful: An Empirical Analysis of Developer and Agentic Pull Requests

SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

What are popular AI coding benchmarks actually measuring? - nilenso blog

#1 open-source agent on SWE-Bench Verified by combining Claude 3.7 and O1 | Augment Code

Testing AI coding agents (2025): Cursor vs. Claude, OpenAI, and Gemini | Render Blog

Top Coding Agents (2025) | Benched.ai

SWE Context Bench: A Benchmark for Context Learning in Coding

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

Confucius Code Agent: An Open-sourced AI Software Engineer at Industrial Scale

How to Build AI Agents by Augmenting LLMs with Codified Human Expert Domain Knowledge? A Software Engineering Framework

Decoding the Configuration of AI Coding Agents: Insights from Claude Code Projects

Agentic Much? Adoption of Coding Agents on GitHub

ChatDev: Communicative Agents for Software Development

An Empirical Study of Developer-Provided Context for AI Coding Assistants in Open-Source Projects

Professional Software Developers Don’t Vibe, They Control: AI Agent Use for Coding in 2025

Published as a conference paper at ICLR 2024

MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework

Context Engineering for Multi-Agent LLM Code Assistants Using Elicit, NotebookLM, ChatGPT, and Claude Code

Context Engineering for AI Agents in Open-Source Software

Why Do Multi-Agent LLM Systems Fail?

Codified Context: Infrastructure for AI Agents in a Complex Codebase

When AGENTS.md Backfires: What a New Study Says About Context Files and Coding Agents

arXiv 2601.20404

Agent READMEs: An Empirical Study of Context Files for Agentic Coding

Testing and Enhancing Multi-Agent Systems for Robust Code Generation

Towards a Science of Scaling Agent Systems

Repository-level context files mostly add cost and activity overhead rather than improving coding-agent success

Agent pull requests are everywhere. Here's how to review them.