mdx-formatter

Type to search...

to open search from anywhere

You are viewing documentation for an older version. View the latest version

Architecture Overview

Architecture Overview

mdx-formatter uses a hybrid formatter approach: it parses markdown/MDX into an AST for structural analysis, then applies targeted line-based operations to the original source text. This design preserves formatting details that AST round-trip serialization would destroy.

Why Not AST Round-Trip?

Most formatters follow a parse → transform → serialize pipeline. The problem with this approach for markdown/MDX is that serialization (AST back to text) is lossy:

  • Inline formatting choices (e.g., *italic* vs _italic_) may be normalized
  • Whitespace inside code blocks, template literals, and JSX expressions can be altered
  • Japanese text line breaks and spacing get mangled
  • Docusaurus admonitions (:::note) aren’t part of the standard AST

By keeping the original source lines and only modifying specific locations, mdx-formatter avoids these problems entirely.

The Hybrid Approach

The formatter works in three phases:

Phase 1: Parse          Phase 2: Analyze         Phase 3: Apply
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│ Source text      │    │ Walk AST         │    │ Sort operations  │
│       ↓         │    │       ↓          │    │ (reverse order)  │
│ remark/mdx      │───>│ Collect line-    │───>│       ↓          │
│ parser          │    │ based operations │    │ Apply to source  │
│       ↓         │    │ (insert, replace,│    │ lines            │
│ mdast AST       │    │  indent, fix)    │    │       ↓          │
└─────────────────┘    └──────────────────┘    │ Normalize empty  │
                                               │ lines            │
                                               └─────────────────┘

The key insight: the AST is used only for analysis (finding positions, understanding structure), never for output generation. All modifications happen on the original source lines.

The Convergence Loop

Some formatting operations can create new formatting issues. For example, inserting an empty line after a heading might create three consecutive empty lines that need to be collapsed.

The formatter handles this with a convergence loop:

Input → format_once() → Output₁
        format_once() → Output₂
        format_once() → Output₃ (max)

The loop runs format_once() up to 3 iterations, stopping early when the output stabilizes (i.e., format(x) === x). This guarantees idempotency: formatting an already-formatted file produces identical output.

Formatting Operations

Instead of mutating the AST or directly editing strings, the formatter collects a list of operations — structured instructions for modifying specific lines:

OperationDescription
insertLineInsert a new line at a given position
replaceLinesReplace a range of lines with new content
indentLineAdd indentation to a specific line
fixListIndentCorrect list item indentation
replaceHtmlBlockReplace an HTML block with Prettier-formatted output

Operation names use camelCase in TypeScript (insertLine, replaceLines) and PascalCase in the Rust port (InsertLine, ReplaceLines) — both implement identical semantics.

Operations are collected from all rules, then applied in reverse line order (bottom-up) so that earlier line numbers remain valid as modifications are made.

Conflict Resolution

When multiple operations target overlapping line ranges, the formatter resolves conflicts:

  1. Overlapping replacements: If a child JSX element and its parent both produce ReplaceLines operations, only the wider (parent) replacement is kept
  2. Operations inside replaced ranges: Any insertLine, indentLine, or fixListIndent that falls inside an already-replaced range is dropped
  3. Deduplication: Operations with identical type and line range are deduplicated via a string key

The 10 Formatting Rules

Each rule is independently toggleable via .mdx-formatter.json configuration:

#RuleDefaultDescription
1addEmptyLineBetweenElementsonAdd single empty line between markdown elements
2formatMultiLineJsxonFormat multi-line JSX with proper indentation
3formatHtmlBlocksInMdxonFormat HTML blocks using Prettier
4expandSingleLineJsxoffExpand single-line JSX with multiple props
5indentJsxContentoffIndent content inside JSX container components
6addEmptyLinesInBlockJsxonAdd empty lines after opening/before closing tags
7formatYamlFrontmatteronFormat YAML frontmatter
8preserveAdmonitionsonPreserve Docusaurus ::: syntax
9errorHandlingoffThrow on parse errors instead of returning original
10autoDetectIndentoffAuto-detect indentation style from file content

Rules are applied in order during the operation collection phase. After all operations are collected, conflict resolution and deduplication happen before the operations are applied to the source lines.

No Plugin System Needed

The TypeScript formatter originally used 10 remark plugins to work around AST round-trip issues (preserve-jsx.ts, preserve-image-alt.ts, fix-autolink-output.ts, etc.). The hybrid approach — which uses the AST only for analysis, never for output — eliminates the need for these plugins. The Rust implementation validates that 9 of 10 plugins are unnecessary (see Rust Rewrite: Plugin Validation).

The formatting logic lives in the Rust engine (crates/mdx-formatter-core/src/formatter.rs). The TypeScript code in src/ is a thin wrapper that loads the native module.

Key Source Files

FileRole
crates/mdx-formatter-core/src/formatter.rsCore Rust formatter: convergence loop, all formatting rules
crates/mdx-formatter-core/src/parser.rsmarkdown-rs AST parsing (MDX, GFM, frontmatter)
crates/mdx-formatter-core/src/html_formatter.rsHTML block indentation formatter
crates/mdx-formatter-core/src/config.rsConfig file loading (3-layer merge)
src/index.tsPublic npm API (format(), formatFile(), checkFile())
src/rust-formatter.tsLoads the Rust napi module
src/settings.tsDefault settings for all 10 rules
src/cli.tsCLI entry point
src/load-config.tsLoads .mdx-formatter.json and merges with defaults

Revision History