Teachable Scraper — Course Structure Extraction

Available since: v1.0.16

Overview

The Teachable Scraper's course structure extraction phase crawls the Teachable curriculum editor for a selected course and reconstructs its full section and lesson hierarchy inside the platform. This is the first phase of the Teachable import pipeline and runs before any media assets are fetched.

What Gets Extracted

For each selected course, the scraper captures:

Data	Description
Section titles	The top-level grouping labels for the curriculum
Section descriptions	Optional descriptive text attached to each section
Lesson titles	The display name of every lesson within each section
Lesson descriptions	Optional summary text for each lesson
Lesson types	One of `video`, `text`, or `quiz` — used to route each lesson to the correct content handler in later pipeline stages

How It Works

The extraction process follows a strict phased approach:

Select a course — Choose the Teachable course to import from the import configuration screen.
Crawl the curriculum editor — The scraper authenticates with Teachable and traverses the curriculum editor page to read the full section/lesson tree.
Build the structural skeleton — Sections and lessons are written to the platform database in order, preserving hierarchy and sequence.
Hand off to media fetching — Once the skeleton is complete, subsequent pipeline stages use it as the manifest for fetching video links, images, embedded content, and document attachments.

Why Skeleton-First?

Extracting structure before media provides several benefits:

Resumability — If a media fetch fails, the skeleton already exists and the import can resume without re-crawling the curriculum.
Auditability — You can review the extracted section/lesson outline and verify it matches the source course before committing to a full media import.
Parallelism — Individual lessons can be dispatched for media fetching independently once their records exist in the system.

Supported Lesson Types

Type	Description
`video`	Lessons whose primary content is an embedded or hosted video
`text`	Lessons composed of rich text, HTML, or document content
`quiz`	Lessons that present learners with assessment questions

Lesson type is recorded on each lesson record at extraction time and determines how the content import handler processes that lesson in subsequent phases.

Limitations

This phase does not download or transfer any media assets. Videos, images, and attachments are fetched in later pipeline stages.
Only courses accessible to the authenticated Teachable account can be extracted.
Quiz question content is not extracted in this release — only the lesson record and its type are captured.