Architecture Overview
Architecture Overview
mdx-formatter uses a hybrid formatter approach: it parses markdown/MDX into an AST for structural analysis, then applies targeted line-based operations to the original source text. This design preserves formatting details that AST round-trip serialization would destroy.
Why Not AST Round-Trip?
Most formatters follow a parse → transform → serialize pipeline. The problem with this approach for markdown/MDX is that serialization (AST back to text) is lossy:
- Inline formatting choices (e.g.,
*italic*vs_italic_) may be normalized - Whitespace inside code blocks, template literals, and JSX expressions can be altered
- Japanese text line breaks and spacing get mangled
- Docusaurus admonitions (
:::note) aren’t part of the standard AST
By keeping the original source lines and only modifying specific locations, mdx-formatter avoids these problems entirely.
The Hybrid Approach
The formatter works in three phases:
Phase 1: Parse Phase 2: Analyze Phase 3: Apply
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Source text │ │ Walk AST │ │ Sort operations │
│ ↓ │ │ ↓ │ │ (reverse order) │
│ remark/mdx │───>│ Collect line- │───>│ ↓ │
│ parser │ │ based operations │ │ Apply to source │
│ ↓ │ │ (insert, replace,│ │ lines │
│ mdast AST │ │ indent, fix) │ │ ↓ │
└─────────────────┘ └──────────────────┘ │ Normalize empty │
│ lines │
└─────────────────┘
The key insight: the AST is used only for analysis (finding positions, understanding structure), never for output generation. All modifications happen on the original source lines.
The Convergence Loop
Some formatting operations can create new formatting issues. For example, inserting an empty line after a heading might create three consecutive empty lines that need to be collapsed.
The formatter handles this with a convergence loop:
Input → format_once() → Output₁
format_once() → Output₂
format_once() → Output₃ (max)
The loop runs format_once() up to 3 iterations, stopping early when the output stabilizes (i.e., format(x) === x). This guarantees idempotency: formatting an already-formatted file produces identical output.
Formatting Operations
Instead of mutating the AST or directly editing strings, the formatter collects a list of operations — structured instructions for modifying specific lines:
| Operation | Description |
|---|---|
insertLine | Insert a new line at a given position |
replaceLines | Replace a range of lines with new content |
indentLine | Add indentation to a specific line |
fixListIndent | Correct list item indentation |
replaceHtmlBlock | Replace an HTML block with Prettier-formatted output |
Operation names use camelCase in TypeScript (insertLine, replaceLines) and PascalCase in the Rust port (InsertLine, ReplaceLines) — both implement identical semantics.
Operations are collected from all rules, then applied in reverse line order (bottom-up) so that earlier line numbers remain valid as modifications are made.
Conflict Resolution
When multiple operations target overlapping line ranges, the formatter resolves conflicts:
- Overlapping replacements: If a child JSX element and its parent both produce
ReplaceLinesoperations, only the wider (parent) replacement is kept - Operations inside replaced ranges: Any
insertLine,indentLine, orfixListIndentthat falls inside an already-replaced range is dropped - Deduplication: Operations with identical type and line range are deduplicated via a string key
The 10 Formatting Rules
Each rule is independently toggleable via .mdx-formatter.json configuration:
| # | Rule | Default | Description |
|---|---|---|---|
| 1 | addEmptyLineBetweenElements | on | Add single empty line between markdown elements |
| 2 | formatMultiLineJsx | on | Format multi-line JSX with proper indentation |
| 3 | formatHtmlBlocksInMdx | on | Format HTML blocks using Prettier |
| 4 | expandSingleLineJsx | off | Expand single-line JSX with multiple props |
| 5 | indentJsxContent | off | Indent content inside JSX container components |
| 6 | addEmptyLinesInBlockJsx | on | Add empty lines after opening/before closing tags |
| 7 | formatYamlFrontmatter | on | Format YAML frontmatter |
| 8 | preserveAdmonitions | on | Preserve Docusaurus ::: syntax |
| 9 | errorHandling | off | Throw on parse errors instead of returning original |
| 10 | autoDetectIndent | off | Auto-detect indentation style from file content |
Rules are applied in order during the operation collection phase. After all operations are collected, conflict resolution and deduplication happen before the operations are applied to the source lines.
No Plugin System Needed
The TypeScript formatter originally used 10 remark plugins to work around AST round-trip issues (preserve-jsx.ts, preserve-image-alt.ts, fix-autolink-output.ts, etc.). The hybrid approach — which uses the AST only for analysis, never for output — eliminates the need for these plugins. The Rust implementation validates that 9 of 10 plugins are unnecessary (see Rust Rewrite: Plugin Validation).
The formatting logic lives in the Rust engine (crates/mdx-formatter-core/src/formatter.rs). The TypeScript code in src/ is a thin wrapper that loads the native module.
Key Source Files
| File | Role |
|---|---|
crates/mdx-formatter-core/src/formatter.rs | Core Rust formatter: convergence loop, all formatting rules |
crates/mdx-formatter-core/src/parser.rs | markdown-rs AST parsing (MDX, GFM, frontmatter) |
crates/mdx-formatter-core/src/html_formatter.rs | HTML block indentation formatter |
crates/mdx-formatter-core/src/config.rs | Config file loading (3-layer merge) |
src/index.ts | Public npm API (format(), formatFile(), checkFile()) |
src/rust-formatter.ts | Loads the Rust napi module |
src/settings.ts | Default settings for all 10 rules |
src/cli.ts | CLI entry point |
src/load-config.ts | Loads .mdx-formatter.json and merges with defaults |