All Docs
FeaturesCSI Teachable Replacement AppUpdated March 13, 2026

HTML-to-TipTap JSON Content Transformer

HTML-to-TipTap JSON Content Transformer

Version 1.0.27 introduces an automated transformer that converts raw HTML scraped from Teachable lesson pages into the structured TipTap / ProseMirror JSON document format used by the platform's lesson content schema.

Why This Exists

When the Teachable import engine scrapes a lesson page, the lesson body is returned as raw HTML. The platform's lesson editor and renderer do not operate on raw HTML — they use the TipTap document model (a superset of ProseMirror's JSON format). The transformer bridges this gap, ensuring imported content is immediately usable without any manual editing or reformatting.

How It Works

The transformer receives an HTML string (the scraped lesson body) and outputs a ProseMirror-compatible JSON document object. The process is:

  1. Parse — The raw HTML string is parsed into a DOM tree.
  2. Traverse — Each HTML node is visited recursively.
  3. Map — Recognised elements and inline marks are converted to their TipTap equivalents.
  4. Output — A valid TipTap doc JSON object is returned and stored in the lesson content schema.

Supported Elements

Block Nodes

HTML ElementTipTap Node TypeNotes
<h1><h6>headinglevel attribute is preserved (1–6)
<p>paragraphStandard paragraph node
<ul>bulletListContains listItem nodes
<ol>orderedListContains listItem nodes
<li>listItemChild of bulletList or orderedList

Inline Marks

HTML ElementTipTap MarkNotes
<strong>, <b>boldApplied as an inline mark on text
<em>, <i>italicApplied as an inline mark on text
<code>codeInline code only (not fenced code blocks)

Output Format

The transformer produces a standard TipTap/ProseMirror top-level document node. Example:

Input HTML:

<h2>Welcome to the course</h2>
<p>In this lesson you will learn <strong>core concepts</strong> and <em>best practices</em>.</p>
<ul>
  <li>Concept one</li>
  <li>Concept two</li>
</ul>

Output TipTap JSON:

{
  "type": "doc",
  "content": [
    {
      "type": "heading",
      "attrs": { "level": 2 },
      "content": [
        { "type": "text", "text": "Welcome to the course" }
      ]
    },
    {
      "type": "paragraph",
      "content": [
        { "type": "text", "text": "In this lesson you will learn " },
        { "type": "text", "text": "core concepts", "marks": [{ "type": "bold" }] },
        { "type": "text", "text": " and " },
        { "type": "text", "text": "best practices", "marks": [{ "type": "italic" }] },
        { "type": "text", "text": "." }
      ]
    },
    {
      "type": "bulletList",
      "content": [
        {
          "type": "listItem",
          "content": [
            { "type": "paragraph", "content": [{ "type": "text", "text": "Concept one" }] }
          ]
        },
        {
          "type": "listItem",
          "content": [
            { "type": "paragraph", "content": [{ "type": "text", "text": "Concept two" }] }
          ]
        }
      ]
    }
  ]
}

Integration with the Teachable Import Engine

The transformer runs automatically as part of the Teachable scrape-and-import pipeline:

  1. The scraper extracts the lesson page HTML from Teachable.
  2. The HTML is passed through the transformer.
  3. The resulting TipTap JSON document is saved to the lesson's content field in the platform database.
  4. The lesson is immediately renderable and editable in the platform's lesson editor — no further migration steps required.

Limitations

  • Fenced / block-level code: Only inline <code> elements are supported in this release. Block-level <pre><code> constructs are not yet mapped to a codeBlock node.
  • Tables: <table> elements are not transformed in this release.
  • Embedded media: <img>, <iframe>, and video embeds are handled by separate pipeline stages and are not in scope for this transformer.
  • Custom Teachable widgets: Proprietary Teachable UI components that render as non-standard HTML may not be preserved.