All Docs
FeaturesSaaS FactoryUpdated February 19, 2026

Stale Pipeline Run Recovery Sweep

Stale Pipeline Run Recovery Sweep

The Stale Pipeline Run Recovery Sweep is a nightly automated batch job that detects pipeline runs stuck in intermediate states and resets them for retry. It ensures the autonomous development loop never silently stalls due to transient failures or agent timeouts.

How It Works

Every night at 04:00 UTC, the sweep scans all active pipelineRuns for records that have been in a non-terminal intermediate status for more than 4 hours. When a stale run is found:

  1. The pipelineRun is marked as failed with the error reason timeout.
  2. Any associated features in the in_progress state are reset to found.
  3. The unified pipeline loop picks them up on its next cycle for a fresh attempt.

Stale Status Thresholds

The following intermediate statuses are monitored. A run is considered stale if it has remained in any of these states for more than 4 hours:

StatusDescription
queuedWaiting to be picked up by an agent
researchingResearch agent is gathering context
architectingArchitect agent is decomposing work
implementingEngineer agents are writing code
testingCI / test agents are verifying changes
awaiting_approvalPending automated or human approval
releasingRelease pipeline is running
marketingMarketing agents are preparing assets
documentingDocumentation agent is generating pages

Terminal statuses (completed, failed, cancelled) are never treated as stale.

Schedule

Cron: 0 4 * * *
Time: 04:00 UTC, daily
Type: nightly_batch

Affected Entities

  • pipelineRuns — Stale runs are transitioned to failed with error: timeout.
  • features — Associated in_progress features are reset to found for retry.

Retry Behavior

Resetting a feature to found re-enters it into the standard unified pipeline loop. The next loop cycle will re-queue the feature and begin a fresh pipeline run from the start. No data from the failed run is carried forward.

Observability

Each sweep execution produces log entries for:

  • The number of stale pipeline runs detected
  • The IDs of runs marked as failed
  • The number of features reset to found

These logs are available in the platform's operational observability dashboard.