All Docs
Getting StartedNurtureHubUpdated March 20, 2026

Under the Hood: Building the Data Foundation for AI Nurture Sequences

Under the Hood: Building the Data Foundation for AI Nurture Sequences

NurtureHub Engineering · v1.0.2


Every feature you'll eventually use in NurtureHub — the AI-generated email sequences, the one-click approval flow, the real-time intent scoring, the hot lead alerts — rests on a reliable data layer. Version 1.0.2 lays exactly that foundation.

This post walks through the schema decisions made in this release and explains why each table is structured the way it is.


The Core Problem We're Modelling

When a property agent assigns a contact to one of NurtureHub's twelve categories (Seller, Landlord, Buy-to-Let Investor, Active Tenant, etc.), the platform needs to:

  1. Generate a personalised, three-email nurture sequence in the agency's brand voice.
  2. Present each email for agent review and approval before anything is sent.
  3. Schedule and deliver approved emails.
  4. Record every engagement event — opens, clicks, bounces, unsubscribes — against the right email and contact.
  5. Score intent in real time and fire alerts when a lead goes hot.
  6. Respect UK GDPR and PECR at every step.

That's a well-defined workflow, and it maps cleanly onto three primary tables plus two compliance tables.


nurture_sequences — The Top-Level Record

The nurture_sequences table is created the moment an agent assigns a contact to a category. It anchors everything else.

nurture_sequences (
  id                   UUID PRIMARY KEY,
  orgId                FK → organisations,
  contactId            FK → contacts,
  categoryId           FK → categories,
  status               ENUM,
  brand_voice_snapshot TEXT / JSON,
  createdAt            TIMESTAMP
)

One column worth calling out: brand_voice_snapshot.

Agencies configure a brand voice in NurtureHub — tone, vocabulary, style guidelines. That configuration can change over time. By snapshotting it at sequence creation, we guarantee that the three emails in any given sequence are coherent with each other and reflect how the agency sounded when the sequence was generated, not how they sound six months later after a rebrand. Immutability here is a feature.


sequence_emails — Three Emails, One Sequence

Each nurture_sequences record owns exactly three sequence_emails rows, identified by a position column (1, 2, or 3).

sequence_emails (
  id           UUID PRIMARY KEY,
  sequenceId   FK → nurture_sequences,
  position     INTEGER CHECK (position BETWEEN 1 AND 3),
  subject      TEXT,
  body_html    TEXT,
  body_text    TEXT,
  status       ENUM ('draft', 'approved', 'sent'),
  scheduled_at TIMESTAMP,
  sent_at      TIMESTAMP
)

The status enum tells the full story of an email's journey:

  • draft — AI has generated the email; agent has not yet reviewed it.
  • approved — Agent has reviewed and approved; email is queued for sending.
  • sent — Email has been dispatched; sent_at is populated.

We store both body_html and body_text for every email. HTML is rendered in modern email clients; plain text is the fallback and is also required by several anti-spam standards. Generating both at sequence creation time avoids doing it at send time under load.


email_events — Engagement at the Row Level

Every interaction a recipient has with a sent email becomes a row in email_events.

email_events (
  id                UUID PRIMARY KEY,
  sequenceEmailId   FK → sequence_emails,
  event_type        ENUM ('open', 'click', 'bounce', 'unsubscribe'),
  occurred_at       TIMESTAMP,
  metadata          JSON
)

The metadata JSON column is intentionally flexible. For a click event it carries the URL that was clicked. For a bounce event it carries the SMTP diagnostic code. For an open event it may carry the email client or approximate geography. Keeping this in JSON rather than typed columns means we can extend what we capture without schema migrations as the product matures.

This table is the raw feed for intent scoring. A contact who opens all three emails and clicks a property valuation link is scored very differently from one who opens email one and goes quiet. The scoring logic sits above this table; the table just faithfully records facts.


UK GDPR & PECR Compliance Tables

Two additional tables are introduced specifically to meet UK data protection obligations.

Unsubscribe Tokens

Every outgoing email includes a unique, tokenised unsubscribe link. The token maps to a record in the unsubscribe tokens table that identifies the recipient and the sequence without exposing any internal IDs in the URL. Tokens are single-use and time-bound.

This approach means a recipient can unsubscribe in one click, with no login required — a requirement under PECR.

Suppression List

When a contact unsubscribes or generates a hard bounce, their address is written to the suppression list. The sending pipeline consults this list before every single dispatch. If an address appears there, the email is not sent — regardless of what the sequence_emails.status column says.

The suppression list is organisation-scoped: an unsubscribe from one agency does not suppress the contact across all agencies on the platform.


What Comes Next

With this schema in place, the following features are unblocked:

  • AI sequence generation — the generation pipeline can write sequence_emails rows in draft status.
  • Agent review and approval UI — the status transition from draftapproved is now persisted.
  • Email dispatch — the scheduler can query approved, scheduled emails and move them to sent.
  • Engagement tracking webhooks — inbound open/click/bounce events can be written to email_events.
  • Intent scoring — aggregate queries over email_events can feed the scoring model.

None of that is in this release. But none of it is possible without it.