Introducing the Local Authority Data Source Registry
Introducing the Local Authority Data Source Registry
Version: 1.0.1
Severity: Critical — foundational infrastructure
Overview
The Local Authority Data Source Registry is the core data layer that enables HMOwatch to monitor UK council licensing information at scale. Every automated scraping bot, compliance alert, and regulatory change notification in the platform traces back to a registered source in this registry.
Before this release, HMOwatch had no structured way to store which councils exist, where their licensing data lives, how often to check it, or what format to expect. This release fixes that gap.
Data Model
local_authorities Table
Represents each of the 400+ UK local authority councils.
| Column | Type | Description |
|---|---|---|
id | UUID / Integer | Primary key |
name | String | Full council name (e.g. Leeds City Council) |
region | String | Geographic region (e.g. Yorkshire and the Humber) |
slug | String | URL-safe identifier |
active | Boolean | Whether this authority is currently being monitored |
created_at | Timestamp | Record creation time |
updated_at | Timestamp | Last modification time |
sources Table
Represents individual scrape targets associated with a local authority.
| Column | Type | Description |
|---|---|---|
id | UUID / Integer | Primary key |
local_authority_id | Foreign Key | Reference to the parent council |
url | String | The target URL to scrape |
data_schema | JSON / Text | Expected structure of the licensing data at this URL |
scrape_frequency | String / Enum | How often this source should be checked (e.g. daily, weekly) |
last_scraped_at | Timestamp | When this source was last successfully scraped |
next_scheduled_at | Timestamp | When this source is next due to be scraped |
active | Boolean | Whether scraping is enabled for this source |
created_at | Timestamp | Record creation time |
updated_at | Timestamp | Last modification time |
Seed Data
An initial seed file is provided to bootstrap the registry with known UK local authority records. This allows the scraping pipeline to begin operating immediately after migration without manual data entry.
To apply the seed:
# Run migrations first
npx prisma migrate deploy
# or
rails db:migrate
# Then seed the registry
npx prisma db seed
# or
rails db:seed
Note: The seed file covers the initial set of known sources. Additional councils and sources can be added via the Admin UI or directly via the API.
Admin UI
A built-in admin interface is available to manage the registry without requiring direct database access.
Capabilities
- View all registered local authorities and their associated sources
- Add new councils and scrape targets
- Edit source URLs, data schemas, and scrape schedules
- Enable / Disable individual sources without deleting them
- Inspect last scraped and next scheduled timestamps
Access
The admin UI is accessible at /admin/sources (requires admin role).
Relationship to the Scraping Pipeline
The registry is consumed by the scraping bots at runtime:
- The scheduler queries
sourcesfor all active records wherenext_scheduled_atis due. - Each bot receives a source record containing the
urlanddata_schemait needs. - After a successful scrape, the bot updates
last_scraped_atand sets the nextnext_scheduled_at. - If a change is detected against the stored schema, the alerting system is triggered.
Without a populated registry, no scraping occurs and no alerts are generated.
Setup Checklist
- Run database migrations to create the
local_authoritiesandsourcestables - Run the seed file to populate initial UK council records
- Verify records appear in the Admin UI at
/admin/sources - Confirm the scrape scheduler is reading from the
sourcestable - Add or update any sources not covered by the seed data