Under the Hood: Building the Data Foundation for AI Nurture Sequences
Under the Hood: Building the Data Foundation for AI Nurture Sequences
NurtureHub Engineering · v1.0.2
Every feature you'll eventually use in NurtureHub — the AI-generated email sequences, the one-click approval flow, the real-time intent scoring, the hot lead alerts — rests on a reliable data layer. Version 1.0.2 lays exactly that foundation.
This post walks through the schema decisions made in this release and explains why each table is structured the way it is.
The Core Problem We're Modelling
When a property agent assigns a contact to one of NurtureHub's twelve categories (Seller, Landlord, Buy-to-Let Investor, Active Tenant, etc.), the platform needs to:
- Generate a personalised, three-email nurture sequence in the agency's brand voice.
- Present each email for agent review and approval before anything is sent.
- Schedule and deliver approved emails.
- Record every engagement event — opens, clicks, bounces, unsubscribes — against the right email and contact.
- Score intent in real time and fire alerts when a lead goes hot.
- Respect UK GDPR and PECR at every step.
That's a well-defined workflow, and it maps cleanly onto three primary tables plus two compliance tables.
nurture_sequences — The Top-Level Record
The nurture_sequences table is created the moment an agent assigns a contact to a category. It anchors everything else.
nurture_sequences (
id UUID PRIMARY KEY,
orgId FK → organisations,
contactId FK → contacts,
categoryId FK → categories,
status ENUM,
brand_voice_snapshot TEXT / JSON,
createdAt TIMESTAMP
)
One column worth calling out: brand_voice_snapshot.
Agencies configure a brand voice in NurtureHub — tone, vocabulary, style guidelines. That configuration can change over time. By snapshotting it at sequence creation, we guarantee that the three emails in any given sequence are coherent with each other and reflect how the agency sounded when the sequence was generated, not how they sound six months later after a rebrand. Immutability here is a feature.
sequence_emails — Three Emails, One Sequence
Each nurture_sequences record owns exactly three sequence_emails rows, identified by a position column (1, 2, or 3).
sequence_emails (
id UUID PRIMARY KEY,
sequenceId FK → nurture_sequences,
position INTEGER CHECK (position BETWEEN 1 AND 3),
subject TEXT,
body_html TEXT,
body_text TEXT,
status ENUM ('draft', 'approved', 'sent'),
scheduled_at TIMESTAMP,
sent_at TIMESTAMP
)
The status enum tells the full story of an email's journey:
draft— AI has generated the email; agent has not yet reviewed it.approved— Agent has reviewed and approved; email is queued for sending.sent— Email has been dispatched;sent_atis populated.
We store both body_html and body_text for every email. HTML is rendered in modern email clients; plain text is the fallback and is also required by several anti-spam standards. Generating both at sequence creation time avoids doing it at send time under load.
email_events — Engagement at the Row Level
Every interaction a recipient has with a sent email becomes a row in email_events.
email_events (
id UUID PRIMARY KEY,
sequenceEmailId FK → sequence_emails,
event_type ENUM ('open', 'click', 'bounce', 'unsubscribe'),
occurred_at TIMESTAMP,
metadata JSON
)
The metadata JSON column is intentionally flexible. For a click event it carries the URL that was clicked. For a bounce event it carries the SMTP diagnostic code. For an open event it may carry the email client or approximate geography. Keeping this in JSON rather than typed columns means we can extend what we capture without schema migrations as the product matures.
This table is the raw feed for intent scoring. A contact who opens all three emails and clicks a property valuation link is scored very differently from one who opens email one and goes quiet. The scoring logic sits above this table; the table just faithfully records facts.
UK GDPR & PECR Compliance Tables
Two additional tables are introduced specifically to meet UK data protection obligations.
Unsubscribe Tokens
Every outgoing email includes a unique, tokenised unsubscribe link. The token maps to a record in the unsubscribe tokens table that identifies the recipient and the sequence without exposing any internal IDs in the URL. Tokens are single-use and time-bound.
This approach means a recipient can unsubscribe in one click, with no login required — a requirement under PECR.
Suppression List
When a contact unsubscribes or generates a hard bounce, their address is written to the suppression list. The sending pipeline consults this list before every single dispatch. If an address appears there, the email is not sent — regardless of what the sequence_emails.status column says.
The suppression list is organisation-scoped: an unsubscribe from one agency does not suppress the contact across all agencies on the platform.
What Comes Next
With this schema in place, the following features are unblocked:
- AI sequence generation — the generation pipeline can write
sequence_emailsrows indraftstatus. - Agent review and approval UI — the status transition from
draft→approvedis now persisted. - Email dispatch — the scheduler can query approved, scheduled emails and move them to
sent. - Engagement tracking webhooks — inbound open/click/bounce events can be written to
email_events. - Intent scoring — aggregate queries over
email_eventscan feed the scoring model.
None of that is in this release. But none of it is possible without it.