FeaturesCSI Teachable Replacement AppUpdated March 13, 2026
Teachable Scraper: Lesson Content Page Extraction
Teachable Scraper: Lesson Content Page Extraction
Available since: v1.0.17
The Teachable import engine fetches each lesson's full content page and extracts all meaningful content elements. This enables the platform to reconstruct lesson content faithfully without any manual migration effort.
What Gets Extracted
When the scraper processes a Teachable school, it visits every individual lesson page and pulls out the following:
| Content Type | Description |
|---|---|
| HTML body text | The full prose and formatted text content of the lesson |
| Embedded video iframes | Video players embedded via Vimeo, YouTube, Wistia, or Teachable's native video host |
| Image tags | Inline images appearing within lesson content |
| Downloadable attachment links | Links to PDFs, documents, and other files learners can download |
Content Structure Preservation
The scraper preserves the original content order as it appears in the Teachable lesson page. Text blocks, video embeds, images, and attachment links are extracted in sequence, so the reconstructed lesson inside the platform mirrors the source layout.
This means:
- No manual reordering of content elements after import.
- Media and attachments are associated with the correct lesson automatically.
- Text content retains its structural context (paragraphs, headings, etc.) from the original HTML.
How It Fits Into the Import Pipeline
- Course structure is traversed — sections and lessons are enumerated.
- Each lesson page is fetched individually.
- Content elements (text, video iframes, images, attachments) are extracted and stored.
- Reconstruction uses the extracted data to rebuild the lesson inside the platform.
This release covers step 2 and step 3, delivering the raw extracted content that the reconstruction layer consumes.
Notes
- Extraction targets content rendered in the Teachable lesson body. Content outside the lesson body (e.g. navigation chrome, course sidebar) is not captured.
- Video extraction identifies
<iframe>elements; the actual video files are not downloaded — only the embed references are preserved. - Image tags are extracted as references; binary image assets are handled separately by the asset copy pipeline.
- Attachment links are extracted as URLs pointing to Teachable-hosted downloadable files.