HTML-to-TipTap JSON Content Transformer

Version 1.0.27 introduces an automated transformer that converts raw HTML scraped from Teachable lesson pages into the structured TipTap / ProseMirror JSON document format used by the platform's lesson content schema.

Why This Exists

When the Teachable import engine scrapes a lesson page, the lesson body is returned as raw HTML. The platform's lesson editor and renderer do not operate on raw HTML — they use the TipTap document model (a superset of ProseMirror's JSON format). The transformer bridges this gap, ensuring imported content is immediately usable without any manual editing or reformatting.

How It Works

The transformer receives an HTML string (the scraped lesson body) and outputs a ProseMirror-compatible JSON document object. The process is:

Parse — The raw HTML string is parsed into a DOM tree.
Traverse — Each HTML node is visited recursively.
Map — Recognised elements and inline marks are converted to their TipTap equivalents.
Output — A valid TipTap doc JSON object is returned and stored in the lesson content schema.

Supported Elements

Block Nodes

HTML Element	TipTap Node Type	Notes
`<h1>` – `<h6>`	`heading`	`level` attribute is preserved (1–6)
`<p>`	`paragraph`	Standard paragraph node
`<ul>`	`bulletList`	Contains `listItem` nodes
`<ol>`	`orderedList`	Contains `listItem` nodes
`<li>`	`listItem`	Child of `bulletList` or `orderedList`

Inline Marks

HTML Element	TipTap Mark	Notes
`<strong>`, `<b>`	`bold`	Applied as an inline mark on text
`<em>`, `<i>`	`italic`	Applied as an inline mark on text
`<code>`	`code`	Inline code only (not fenced code blocks)

Output Format

The transformer produces a standard TipTap/ProseMirror top-level document node. Example:

Input HTML:

<h2>Welcome to the course</h2>
<p>In this lesson you will learn <strong>core concepts</strong> and <em>best practices</em>.</p>
<ul>
  <li>Concept one</li>
  <li>Concept two</li>
</ul>

Output TipTap JSON:

{
  "type": "doc",
  "content": [
    {
      "type": "heading",
      "attrs": { "level": 2 },
      "content": [
        { "type": "text", "text": "Welcome to the course" }
      ]
    },
    {
      "type": "paragraph",
      "content": [
        { "type": "text", "text": "In this lesson you will learn " },
        { "type": "text", "text": "core concepts", "marks": [{ "type": "bold" }] },
        { "type": "text", "text": " and " },
        { "type": "text", "text": "best practices", "marks": [{ "type": "italic" }] },
        { "type": "text", "text": "." }
      ]
    },
    {
      "type": "bulletList",
      "content": [
        {
          "type": "listItem",
          "content": [
            { "type": "paragraph", "content": [{ "type": "text", "text": "Concept one" }] }
          ]
        },
        {
          "type": "listItem",
          "content": [
            { "type": "paragraph", "content": [{ "type": "text", "text": "Concept two" }] }
          ]
        }
      ]
    }
  ]
}

Integration with the Teachable Import Engine

The transformer runs automatically as part of the Teachable scrape-and-import pipeline:

The scraper extracts the lesson page HTML from Teachable.
The HTML is passed through the transformer.
The resulting TipTap JSON document is saved to the lesson's content field in the platform database.
The lesson is immediately renderable and editable in the platform's lesson editor — no further migration steps required.

Limitations

Fenced / block-level code: Only inline <code> elements are supported in this release. Block-level <pre><code> constructs are not yet mapped to a codeBlock node.
Tables: <table> elements are not transformed in this release.
Embedded media: <img>, <iframe>, and video embeds are handled by separate pipeline stages and are not in scope for this transformer.
Custom Teachable widgets: Proprietary Teachable UI components that render as non-standard HTML may not be preserved.