AI Analytics Engine

Availability: This feature is part of release v1.0.35, which is pending merge. The infrastructure described here is not yet deployed. This page will be updated when the release lands on main.

The AI Analytics Engine provides tenant-isolated ML-powered insights, on-demand predictions, and anonymized cross-tenant industry benchmarks across HR, finance, legal, and operations domains.

Architecture overview

┌─────────────────────────────────────────────────────────────┐
│                        Your tenant                          │
│                                                             │
│  tRPC client  →  aiAnalytics router  →  prediction_jobs     │
│                        │                    insights        │
│                        │                                    │
│               Inngest event bus                             │
│                        │                                    │
│   ┌────────────────────┼───────────────────────────────┐    │
│   │  ai-prediction-job-processor (event-driven)        │    │
│   │  ai-daily-metric-calculation  (03:00 UTC daily)    │    │
│   │  ai-weekly-model-retraining   (02:00 UTC Mon)      │    │
│   │  ai-monthly-benchmark-update  (01:00 UTC 1st)      │    │
│   └────────────────────────────────────────────────────┘    │
│                        │                                    │
│              ai_models (shared, anonymized)                 │
│              benchmark_data (cross-tenant, anonymized)      │
└─────────────────────────────────────────────────────────────┘

Data model

`ai_models` — ML model registry

Stores versioned model definitions. Models are shared across all tenants and trained on anonymized aggregate data.

Column	Type	Description
`id`	`text` (UUID)	Primary key
`name`	`text`	Display name
`type`	enum	`turnover_prediction`, `spend_forecasting`, `contract_risk`, `performance_prediction`
`version`	`text`	Semantic version string
`trainingDataCutoff`	`timestamp`	Latest training data included
`accuracyMetrics`	`jsonb`	`{ accuracy, precision, recall, f1, ... }` — shape varies by type
`status`	enum	`training` → `active` → `deprecated` / `failed`

Unique constraint: (type, version).

`prediction_jobs` — prediction queue

Records each prediction request made by a tenant.

Column	Type	Description
`id`	`text` (UUID)	Primary key
`orgId`	`text`	Owning organization (tenant-scoped)
`modelId`	`text`	FK → `ai_models.id`
`inputDataHash`	`text`	SHA-256 of the serialized input payload
`status`	enum	`queued` → `running` → `completed` / `failed`
`scheduledFor`	`timestamp`	Execution target time
`completedAt`	`timestamp`	Populated on completion
`results`	`jsonb`	Structured prediction output
`errorMessage`	`text`	Populated on failure

Unique constraint: (orgId, modelId, inputDataHash) — prevents duplicate jobs for identical inputs.

`insights` — AI recommendations

Holds generated recommendations surfaced to users within a tenant.

Column	Type	Description
`id`	`text` (UUID)	Primary key
`orgId`	`text`	Owning organization
`category`	enum	`hr`, `finance`, `legal`, `operations`
`insightType`	`text`	e.g. `high_turnover_risk`, `budget_overrun`
`title`	`text`	Short human-readable title
`description`	`text`	Full description with supporting context
`confidenceScore`	`decimal(4,3)`	Model confidence — `0.000` to `1.000`
`dataPoints`	`jsonb`	Supporting data, e.g. `{ affectedEmployeeCount: 5 }`
`recommendedActions`	`jsonb[]`	Ordered list of `{ action, priority, estimatedImpact?, resourceUrl? }`
`expiresAt`	`timestamp`	Insight is hidden from queries after this time
`viewedBy`	`jsonb`	Array of user IDs that have read the insight
`dismissedAt`	`timestamp`	Set when a user dismisses the insight (soft-delete)
`dismissedBy`	`text`	FK → `users.id`

`benchmark_data` — cross-tenant benchmarks

Aggregate statistics only — no org IDs or personal data.

Column	Type	Description
`metricName`	`text`	e.g. `employee_turnover_rate`
`industry`	`text`	e.g. `technology`, `healthcare`
`companySizeRange`	`text`	e.g. `50-200`, `201-1000`
`percentile25`	`decimal(10,4)`	25th-percentile value
`percentile50`	`decimal(10,4)`	Median value
`percentile75`	`decimal(10,4)`	75th-percentile value
`lastUpdated`	`timestamp`	When this row was last recalculated

Unique constraint: (metricName, industry, companySizeRange).

tRPC API

All procedures are under the aiAnalytics namespace.

`aiAnalytics.getInsights`

Returns a paginated list of active insights for the calling tenant. Excludes expired and dismissed insights.

Input filters:

category — filter by hr, finance, legal, or operations
insightType — filter by specific type string
minConfidence — minimum confidence score threshold (0.0–1.0)
unviewedOnly — only return insights the current user has not yet seen
limit / cursor — pagination

`aiAnalytics.dismissInsight`

Soft-deletes an insight for the tenant. Sets dismissedAt and dismissedBy, and writes an audit log entry. The insight will no longer appear in getInsights results.

Input: { insightId: string }

`aiAnalytics.requestPrediction`

Enqueues a prediction job. The router resolves the latest active model of the requested type and deduplicates the request using the SHA-256 hash of the input payload — if an identical job already exists for this org + model, the existing job ID is returned instead.

Input: { modelType: AiModelType, inputData: Record<string, unknown> }

Returns: { jobId: string, deduplicated: boolean }

`aiAnalytics.getPredictionJob`

Polls the status and results of a prediction job.

Input: { jobId: string }

Returns: Full PredictionJob record including status, results, and errorMessage.

`aiAnalytics.getBenchmarks`

Queries industry benchmark data. Results are always anonymized aggregate statistics.

Input: { metrics: string[], industry?: string, companySizeRange?: string }

`aiAnalytics.listModels`

Admin-only. Returns all entries in the model registry with accuracy metrics and status.

Background jobs

Daily metric calculation

Schedule: 0 3 * * * (03:00 UTC every day)

Runs across all organizations. For each org, computes:

Turnover rate — flags high-risk employees and emits hr category insights
Payroll cost spikes — detects anomalous payroll movements and emits finance insights
Expiring contracts — identifies contracts approaching renewal dates and emits legal insights

Each insight is generated with a model-derived confidence score and a set of recommended actions.

Weekly model retraining

Schedule: 0 2 * * 1 (02:00 UTC every Monday)

Refreshes all four model types (turnover_prediction, spend_forecasting, contract_risk, performance_prediction). The job:

Marks all current active versions of each model type as deprecated.
Inserts a new model row with an incremented version and updated training data cutoff.

Monthly benchmark update

Schedule: 0 1 1 * * (01:00 UTC on the 1st of each month)

Upserts benchmark percentile values for 6 metrics × 6 industries × 5 company size ranges = up to 180 rows per run. All values are fully anonymized aggregates.

Prediction job processor

Trigger: Inngest event ai/prediction.requested

Executes queued prediction jobs as they arrive.

Concurrency: Max 2 simultaneous jobs per org
Throttle: Max 50 jobs per minute globally
Permanent failures: Uses NonRetriableError to halt retries on unrecoverable errors (e.g. model not found, malformed input)

Privacy and security

Tenant isolation: prediction_jobs and insights are always scoped to orgId. Cross-tenant queries are not possible through the router.
Benchmark anonymization: benchmark_data contains no org identifiers or PII — only aggregate percentile statistics.
Input deduplication: Prediction inputs are hashed before storage; raw input payloads are not persisted.
Insight expiry: All insights carry an expiresAt timestamp. Expired insights are filtered out at the query layer and will never surface to users after their TTL.
Audit trail: All dismissals and prediction requests produce audit log entries.

AI Analytics Engine

AI Analytics Engine

Architecture overview

Data model

ai_models — ML model registry

prediction_jobs — prediction queue

insights — AI recommendations

benchmark_data — cross-tenant benchmarks

tRPC API

aiAnalytics.getInsights

aiAnalytics.dismissInsight

aiAnalytics.requestPrediction

aiAnalytics.getPredictionJob

aiAnalytics.getBenchmarks

aiAnalytics.listModels

Background jobs

Daily metric calculation

Weekly model retraining

Monthly benchmark update

Prediction job processor

Privacy and security

`ai_models` — ML model registry

`prediction_jobs` — prediction queue

`insights` — AI recommendations

`benchmark_data` — cross-tenant benchmarks

`aiAnalytics.getInsights`

`aiAnalytics.dismissInsight`

`aiAnalytics.requestPrediction`

`aiAnalytics.getPredictionJob`

`aiAnalytics.getBenchmarks`

`aiAnalytics.listModels`