Blog: Eliminating Full Table Scans on the Invoices Table

Eliminating Full Table Scans on the Invoices Table

Release v1.0.47 · Database Performance

Revenue analytics is one of the most read-heavy parts of the SaaS Factory platform. Every time the revenue dashboard loads, the revenueAnalytics router fires 4–6 queries in parallel — pulling MRR time series, revenue movement breakdowns, ARR summaries, and top product rankings. Until now, every single one of those queries was doing a full sequential scan of the invoices table.

With v1.0.47, that changes.

What Was Happening

The invoices table had no indexes defined — not in billing-schema.ts, not in the migrations directory. That means the database had no choice but to read every row on every revenue analytics query, regardless of the filters applied.

The arrSummary procedure alone runs 4 parallel sub-queries all hitting the same unindexed table simultaneously. At low invoice volumes this isn't noticeable. As a product grows its billing history, the cost compounds linearly with every new invoice written.

This is a classic write-once, pay-forever problem: the penalty for missing an index only becomes visible over time, which is exactly why it's easy to miss and expensive to ignore.

The Fix

Three targeted composite indexes have been added to src/db/billing-schema.ts:

1. `(project_id, paid_at, status)` on `invoices`

This is the primary workhorse index. The revenueAnalytics router's time-series and aggregation queries always filter by project_id and slice by date ranges on paid_at. Including status in the composite allows the query planner to satisfy status filters (e.g. filtering to paid invoices only) directly from the index without touching the heap.

-- Example query that benefits:
SELECT date_trunc('month', paid_at), SUM(amount)
FROM invoices
WHERE project_id = $1
  AND paid_at >= $2
  AND status = 'paid'
GROUP BY 1;

2. `(customer_id, paid_at)` on `invoices`

Revenue movement queries (new MRR, expansion, contraction, churn) join on customer_id to track per-customer revenue changes over time. The leading customer_id column means a join probe on any given customer immediately narrows to that customer's invoice history, with paid_at available for date ordering and range filtering without an additional sort step.

3. `(product_id)` on `invoice_line_items`

The topProducts query aggregates revenue by product across invoice_line_items. A single-column index on product_id is sufficient here — line items are typically queried by product for aggregation, and the selectivity of product_id alone is high enough to make this index effective.

Performance Characteristics

Metric	Before	After
Query complexity	`O(n)` — full sequential scan	`O(log n)` — index-range scan
Estimated speed-up	—	10–100× as invoice volume grows
Write overhead	None	Marginally increased insert/update latency

The write-time cost is real but expected to be negligible. Revenue analytics queries are read-heavy and latency-sensitive (they block dashboard renders). Inserts to invoices happen at billing event time and are already asynchronous. The trade-off strongly favours indexing.

Why Composite Indexes

Single-column indexes on project_id or paid_at alone would help, but composite indexes let the query planner satisfy multiple filter and sort conditions from a single index traversal. Ordering the columns by selectivity and query pattern — most-filtered column first — means the planner can use the same index for a wide range of query shapes without needing separate indexes for each.

Impact

Every product built on the SaaS Factory platform that uses revenue analytics benefits immediately from this change. The fix is a schema-level migration with no application code changes required — the improvement is fully transparent to the revenueAnalytics router and all existing dashboard queries.