Agent Performance & Quality Score Leaderboard — What's New in v1.0.138

Agent Performance & Quality Score Leaderboard

Available from: v1.0.138

SaaS Factory operates a fleet of 32+ specialised AI agents — architects, engineers, testers, researchers, and more. As of v1.0.138, the platform exposes an Agent Health dashboard that turns raw execution data into a ranked, human-readable intelligence layer.

The Problem This Solves

Before this release, every agent ran as a black box from the user's perspective. The signals needed to evaluate agent quality — job completion records, pipeline failures, feature throughput, token consumption — all existed in the database, but were never aggregated or surfaced. Users had no way to answer questions like:

Which agents are actually shipping features?
Which agents are costing the most per unit of output?
Which agents keep breaking CI?
Which agents should I disable to reduce noise?

The Agent Health dashboard answers all of these directly.

Dashboard Overview

The Agent Health dashboard presents a quality score leaderboard — a ranked table of every agent on the platform, ordered by composite performance score.

Metrics

Metric	Description	Source Table
Features Shipped	Total features successfully delivered to production	`features`
Quarantine Rate	% of outputs flagged and quarantined	`pipeline_failures`
Token Cost per Feature	Average tokens consumed per shipped feature	`agent_jobs`
CI Failure Rate	% of PRs that fail continuous integration	`pipeline_failures`

Quality Score

Each agent receives a composite quality score derived from the four metrics above. The score is designed to reward agents that ship frequently, cheaply, and reliably — and penalise those with high quarantine or CI failure rates.

Agents are ranked from highest to lowest score on the leaderboard, making it easy to identify:

Top performers — high throughput, low failure rates, efficient token use.
Underperformers — agents that consume tokens but fail CI or get quarantined regularly.
Cost outliers — agents with disproportionately high token cost per feature.

Using the Dashboard

Tuning Your Agent Fleet

The leaderboard is the primary tool for deciding which agents to keep active. If an agent has a persistently high quarantine rate or CI failure rate, consider disabling it until the underlying issues are resolved.

Platform Feedback Loop

The aggregated quality data is also consumed internally by the platform to inform agent improvement cycles. Poor-scoring agents surface as candidates for prompt refinement, architectural changes, or replacement — closing the loop between measurement and improvement.

Data Sources

All metrics are derived from three existing tables:

agent_jobs — execution records for every agent invocation, including token usage and job outcomes.
pipeline_failures — records of CI failures and quarantine events, keyed to the originating agent.
features — the canonical record of shipped features, used to attribute delivery credit to the responsible agent.

No new data collection is required. The dashboard aggregates data that the platform was already recording.

Changelog — v1.0.138