v1.0.138 — Agent Performance & Quality Score Leaderboard (1.0.138)

The platform has 32+ agents but there is no user-visible quality scoring showing which agents are producing the most shipped features, which have the highest quarantine rates, which cost the most tokens per feature, or which are consistently failing CI. This data exists in the DB (agent_jobs, pipeline_failures, features) but is never aggregated into a meaningful intelligence layer. Surfacing this as an 'Agent Health' dashboard would: (1) help users tune which agents to enable, (2) give the platform itself a feedback loop for agent improvement, and (3) be a compelling differentiator vs black-box competitors like Devin.

v1.0.138 — Agent Performance & Quality Score Leaderboard

Release Notes