Skip to content

Document model reference

The Document is the format-agnostic shape every backend produces. This page is the strict reference for its fields, variants, and overlays — the counterpart to the prose in Architecture / The 5-layer document model.

Shape

Document
├── metadata          DocumentMetadata
├── content           Vec<Block>            -- the content spine
├── presentation      Presentation          -- overlay (geometry / fonts / colour)
├── relationships     Relationships         -- overlay (links / footnotes / bookmarks / TOC)
├── interactions      Interactions          -- overlay (forms / comments / revisions)
└── images            AssetStore<ImageAsset> -- shared image bytes referenced by ImageRef

The content spine is the canonical text-bearing tree, always present. The three overlays carry independently optional layers keyed off the same node_ids the spine assigns. Disabling an overlay via Config(layers=...) skips its work; the spine is untouched.

The shared image asset store lives in doc.images. Every Block::Image and Inline::InlineImage carries an ImageRef index into the store; the same image referenced N times is stored once.

Every node carries a typed NodeId allocated from the document's arena. Overlay payloads (presentation, relationships, interactions) key off these ids; consumers walk the spine and look up overlay data per node when they need it.

Block

A block-level content element. The Rust enum is #[non_exhaustive]; new variants may be added without a major-version bump.

Variant Fields Notes
Heading id, level: u8, content: Vec<Inline> level is the semantic heading depth (typically 1–6; clamp values outside the range).
Paragraph id, content: Vec<Inline> The default text container.
Table id, table: TableData See Table below.
List id, items: Vec<ListItem>, kind: ListKind, start: u64 kind is Ordered or Unordered; start is the first ordinal for ordered lists.
CodeBlock id, text: String, language: Option<String> Preformatted, language-tagged when known.
Image id, image_ref: ImageRef, alt_text: Option<String> Block-level image. The bytes live in doc.images[image_ref].
PageBreak id A page / slide / sheet boundary.
ThematicBreak id A horizontal rule.
Section id, role: Option<SectionRole>, children: Vec<Block> A semantic container (HTML section/article/nav/aside, PDF tagged structure, DOCX section).
Shape id, kind: ShapeKind, children: Vec<Block>, alt_text: Option<String> A non-textual visual element (PPTX shape, SVG primitive). Shapes can nest and contain text.

ListItem

ListItem { content: Vec<Block> }

A list item is a sequence of blocks (paragraphs, nested lists, images, etc). The flatten-friendly shape is intentional: an item can carry multi-block content without forcing list-specific node types into the spine.

Block::text

Every block exposes a recursive text() that walks its content and produces a plain-text reconstruction. The convention used across formats:

  • Headings, paragraphs, code blocks: text concatenated in source order.
  • Lists: items separated by \n; multi-block items separated by \n between blocks within the item.
  • Tables: rows separated by \n; cells separated by \t; cell blocks separated by (single space).
  • Sections / Shapes: child blocks separated by \n.
  • Images / breaks: empty (image alt text is not included; pattern match on Block::Image to access it).

Block discriminant strings

In Python (and JSON output), the variant is exposed as a lowercase-snake-case discriminant string in the kind field:

"paragraph", "heading", "list", "table", "code_block", "image",
"page_break", "thematic_break", "section", "shape"

Inline

An inline content element within a block.

Variant Fields Notes
Text id, text: String, style: SpanStyle The default inline. style carries semantic markup.
Code id, text: String Inline <code>-equivalent.
Link id, url: String, content: Vec<Inline> Hyperlink. URL is content (it is part of what the document says).
FootnoteRef id, label: String Marker for a footnote / endnote. The definition lives in the relationships overlay (doc.relationships.footnotes[label]).
InlineImage id, image_ref: ImageRef, alt_text: Option<String> Inline image. Bytes in doc.images[image_ref].
SoftBreak id Reflowable break (collapse to a space when wrapping).
LineBreak id Hard newline (preserve in plain-text output).

SpanStyle

SpanStyle {
    bold:          bool,
    italic:        bool,
    underline:     bool,
    strikethrough: bool,
    superscript:   bool,
    subscript:     bool,
}

These flags carry semantic weight (bold = emphasis; italic = citation, in the prose sense). Extended visual styling (font name, font size, colour) lives in the presentation overlay, not on SpanStyle. Custom serde: only the fields that are true serialise — empty SpanStyle becomes {}.

Inline discriminant strings

"text", "code", "link", "footnote_ref", "inline_image",
"soft_break", "line_break"

Table

TableData {
    rows:                       Vec<TableRow>,
    num_columns:                u32,
    header_row_count:           u32,
    has_header_row:             bool,
    may_continue_from_previous: bool,
    may_continue_to_next:       bool,
}

TableRow {
    cells: Vec<TableCell>,
}

TableCell {
    content:  Vec<Block>,
    col_span: u32,
    row_span: u32,
    value:    Option<String>,    -- typed string for spreadsheet cells
}

CellValue {
    String(String),
    Number(f64),
    Boolean(bool),
    Date(String),
    Empty,
    Error(String),
}

num_columns is the logical column count after merge resolution. header_row_count records how many leading rows are headers (commonly 1; 0 for table-of-contents-style or borderless tables; >1 for spreadsheets with grouped headers).

may_continue_from_previous and may_continue_to_next are hints, not promises: PDF detects continuations from page-break adjacency and matching column shape; DOCX / OOXML use explicit markers. False negatives are common on producers that do not flag continuations explicitly.

TableCell.value is a normalised string for spreadsheet cells (XLSX, ODS, XLS) — the underlying typed value formatted with the backend's number-format engine. For document tables (PDF, DOCX, RTF) it is None and cell.text is the only meaningful surface.

TableCell.content lets cells nest arbitrary blocks. A cell can contain paragraphs, lists, even nested tables; spreadsheet backends always emit a single Paragraph per cell, but document backends preserve real structure where the source has it.

Presentation overlay

The geometry / styling layer. Disabled with Config(layers=LayerConfig(presentation=False)).

Presentation carries:

  • Bounding boxes per block (where on the page is this).
  • Positioned spans (PDF) — per-glyph or per-run placement with font reference, font size, baseline, advance.
  • Page geometryPageDef per page: rotation, media box, crop box, page size.
  • Extended text styling (ExtendedTextStyle) — font name, font size, fill colour, stroke colour, alignment.
  • Layout info (LayoutInfo) — block-level layout mode, flow direction, alignment, padding.
  • Paint paths (PaintPath) — stroke / fill paths the renderer emits to reproduce the page.
  • Colour models (Color) — RGB, CMYK, gray, ICC, plus alpha.
  • Patterns and shadings (PaintPattern, PaintShading) — PDF tiling patterns and gradient fills (Type 1–7; Type 4–7 function-based shadings are partially supported).
  • Soft masks and clip regions — alpha-channel masks and page clipping.
  • Image placement (ImagePlacement) — geometry per image reference.

The PDF renderer reads from this overlay; layout-detection hooks consume bounding boxes for region-of-interest crops; downstream viewers reconstruct page appearance from it.

The full type list with field-level documentation lives in crates/udoc-core/src/document/presentation.rs.

Relationships overlay

The link / cross-reference layer. Disabled with Config(layers=LayerConfig(relationships=False)).

Relationships {
    footnotes:      HashMap<String, FootnoteDef>,
    bookmarks:      HashMap<String, BookmarkTarget>,
    hyperlinks:     Vec<String>,
    captions:       SparseOverlay<NodeId>,
    toc_entries:    Vec<TocEntry>,
    component_refs: SparseOverlay<ComponentRef>,
}
Type Notes
FootnoteDef { label, content: Vec<Block> }. Pair with Inline::FootnoteRef::label.
BookmarkTarget Resolved(NodeId) for bookmarks resolved to a node; Positional when the source marks a position between elements (DOCX).
TocEntry { level, text, target: Option<NodeId> }. Level 1 = top-level entry.
ComponentRef { component_id, overrides }. Reserved for component / template references.

Per-node members (captions, component_refs) participate in relationships.has_node(node_id); document-wide members (footnotes, bookmarks, hyperlinks, toc_entries) do not.

Resource caps (defending against pathological inputs): MAX_FOOTNOTES, MAX_BOOKMARKS, MAX_HYPERLINKS, MAX_CAPTIONS, MAX_TOC_ENTRIES, MAX_COMPONENT_REFS — defined in the relationships module. Add operations that hit a cap return a *AddResult::LimitReached rather than silently truncating.

Interactions overlay

The actionable layer. Disabled with Config(layers=LayerConfig(interactions=False)).

Interactions {
    form_fields:     Vec<FormField>,
    comments:        Vec<Comment>,
    tracked_changes: Vec<TrackedChange>,
}

FormField {
    anchor:     Option<NodeId>,
    name:       String,
    field_type: FormFieldType,    -- Text | Checkbox | Radio | Dropdown | Signature | Button
    value:      Option<String>,
    bbox:       Option<BoundingBox>,
    page_index: Option<usize>,
}

Comment {
    anchor:     NodeId,
    author:     Option<String>,
    date:       Option<String>,
    text:       String,
    replies:    Vec<Comment>,         -- nested up to MAX_COMMENT_DEPTH (64)
    bbox:       Option<BoundingBox>,  -- for PDF annotations spanning a region
    page_index: Option<usize>,
}

TrackedChange {
    anchor:      NodeId,
    change_type: ChangeType,           -- Insertion | Deletion | FormatChange
    author:      Option<String>,
    date:        Option<String>,
    old_content: Option<Vec<Inline>>,  -- for Deletion / FormatChange
}

FormField.anchor is None for fields that were not associated with a node (free-floating PDF AcroForm widgets). bbox and page_index carry the geometric placement when the field has one.

Comment threading recurses through replies. Deserialisation caps nesting at MAX_COMMENT_DEPTH (64) to defend against adversarial JSON.

TrackedChange.old_content carries the inline content that was removed for Deletion and FormatChange; Insertion leaves it None (the new content lives on the spine).

DocumentMetadata

DocumentMetadata {
    title:             Option<String>,
    author:            Option<String>,
    subject:           Option<String>,
    creator:           Option<String>,
    producer:          Option<String>,
    creation_date:     Option<String>,
    modification_date: Option<String>,
    page_count:        usize,
    properties:        HashMap<String, String>,
}

Always present, even when every structured field is None. The properties map is the open-ended bag for format-specific extended fields; common keys:

  • OOXML core / ODF meta: dc:creator, dc:subject, dc:description, dc:language, dcterms:created, dcterms:modified.
  • OOXML extended (app.xml): app:Application, app:AppVersion, app:Company, app:Pages, app:Words, app:Characters, app:Lines, app:Paragraphs.
  • PDF Info dict: pdf:Producer, pdf:Creator, pdf:Trapped, plus any custom Info-dict entries the producer wrote.

creation_date and modification_date are ISO 8601 strings when the source supplied a parseable date; otherwise the original raw string from the source is preserved.

NodeId and arena

Every node in the spine is allocated a typed NodeId from a monotonically increasing arena. Overlay payloads key off these ids. Consumers should treat ids as opaque handles — equality and hashing are well-defined; arithmetic is not.

SparseOverlay<T> and Overlay<T> are the per-node payload container types in udoc_core::document::overlay. SparseOverlay is a hash-map-backed map for sparsely-populated layers; Overlay is a vec-backed dense store for layers that approach one entry per node.

Document::walk() is the canonical recursive traversal; pattern matching on the Block variant is the right call when you want variant-specific access (children, tables, lists).

Images and the asset store

ImageAsset {
    width:               u32,
    height:              u32,
    bits_per_component:  u8,
    filter:              ImageFilter,    -- Flate | Dct | Ccitt | Jbig2 | Jpx | Raw
    mime_type:           &'static str,   -- "image/jpeg", "image/png", ...
    data:                Vec<u8>,
}

ImageRef = AssetRef<ImageAsset>

doc.images is an AssetStore<ImageAsset>: a Vec<ImageAsset> indexed by ImageRef. The same bitmap referenced from many places in the spine (slide deck logos, repeating headers) is stored once.

Setting Config(assets=AssetConfig(images=False)) skips bitmap loading entirely; references in the spine become empty placeholders rather than valid ImageRefs.

Cross-references

Spine site Overlay site
Inline::FootnoteRef { label } relationships.footnotes[label] -> FootnoteDef
Block::Image / Inline::InlineImage { image_ref } doc.images[image_ref] -> ImageAsset
Any Block / Inline with id presentation per-node bbox / extended style / layout info
Any Block / Inline with id relationships.captions[id] -> NodeId (caption association)
Any Block with id interactions.{form_fields, comments, tracked_changes} keyed via anchor == id

The convention: the spine is the source of truth for what the document says; overlays carry where it lives, what it links to, and what a viewer can do with it. A consumer that only needs text reads the spine and ignores overlays; a consumer building a viewer reads both.

Source

Definitive types live in crates/udoc-core/src/document/:

  • mod.rsDocument, NodeId, re-exports.
  • content.rsBlock, Inline, SpanStyle, ListItem, ListKind, SectionRole, ShapeKind.
  • table.rsTableData, TableRow, TableCell, CellValue.
  • presentation.rs — every presentation-overlay type listed above plus the paint, pattern, shading, mask, and colour primitives.
  • relationships.rsRelationships, FootnoteDef, BookmarkTarget, TocEntry, ComponentRef, plus the *AddResult enums and resource caps.
  • interactions.rsInteractions, FormField, FormFieldType, Comment, TrackedChange, ChangeType.
  • assets.rsAssetStore, AssetRef, ImageAsset, FontAsset, FontProgramType.
  • overlay.rsOverlay, SparseOverlay.

The Python types in Python API reference mirror these one-for-one with kind-discriminant pyclasses.