Skip to Content

Turning Web pages into beautiful print: The architecture of Paper-Muncher

Duration: 36:57


PART 1 — Analytical Summary 🚀

Context 💼

This talk, led by Clen (software developer at Odoo and lead of the Paper‑Muncher project), explains why Odoo is moving away from wkhtmltopdf and how a new, purpose-built document rendering engine—Paper‑Muncher—has been architected to turn web pages into high‑quality PDFs and other outputs. The session is both a diagnosis of long‑standing printing pain points in Odoo and a deep dive into the technical foundations, performance goals, and roadmap for a reliable, modern replacement.

Core ideas & innovations 🧠

The story starts with wkhtmltopdf, an aging tool built on a 2014 WebKit that’s been deprecated since 2020. It often yields broken fonts, missing text/images, and poor reliability—unacceptable for critical business documents. Worse, its flow mirrors a full browser: it secretly opens a window, loads the page, enters print preview, and re-renders, doing roughly 3× the necessary work. That design inflates CPU and memory usage and creates operational pitfalls (e.g., asset fetch deadlocks when there’s a single Odoo worker).

Clen evaluated alternatives. WeasyPrint (Python) is simple and OSS-friendly but slowed by the Python object model and cache‑unfriendly memory access patterns that amplify CPU cache misses—fine for small jobs, but not for Odoo scale. Chromium/Chrome is very fast but memory‑hungry; like any interactive browser engine, it’s optimized for responsiveness, not one-shot, low‑memory document generation.

Enter Paper‑Muncher: a new C++ engine that aims to blend Chromium‑like speed with WeasyPrint‑like simplicity, while remaining faithful to web standards so Odoo developers don’t have to relearn or rewrite their templates. Rather than inventing a new DSL for layout, Paper‑Muncher embraces HTML/CSS, then builds a fit‑for‑purpose pipeline to produce PDFs and more.

The architecture mirrors a modern browser engine—designed specifically for printing:

  • Document loading and streaming: Odoo communicates with Paper‑Muncher through Unix pipes, allowing streaming from stdin to stdout and avoiding temporary file roundtrips. Asset fetching (fonts, images, CSS) happens via lightweight HTTP-over-pipes back to Odoo, so permissions, auth, and caching remain under Odoo’s control.
  • Security: the renderer runs in a hardened sandbox using Linux namespaces and seccomp, keeping failures contained.
  • Parsing and modeling: HTML is tokenized and built into a DOM; CSS is parsed into an intermediate SST (Skeleton Syntax Tree) to preserve forward/backward compatibility, then into a stylesheet object model. Unknown features can be safely ignored without breaking rendering.
  • Style computation: CSS rule matching uses selector indexing to reduce complexity from “N elements × M rules.” Computed styles are grouped into cache‑friendly structures (e.g., grouping related properties), minimizing memory churn and speeding inheritance and recalculation.
  • Box tree and layout: The DOM produces a CSS box tree based on display properties. A layout algorithm computes sizes and positions under parent constraints, handling block and inline flow, floats, tables, and more.
  • Pagination via fragments: For page breaks, Paper‑Muncher creates fragment trees that split boxes across pages and track resumable breakpoints—even for parallel flows—ensuring clean pagination and consistent continuation.
  • Painting and z-order: Fragments emit visual primitives (rectangles, images, text) with proper z-index ordering.
  • Output abstraction: A Canvas layer exposes primitives (draw, transform, text) and maps them to different backends—PDF, SVG, and raster images today—with room for specialized outputs (e.g., future ZPL for labels).

The CLI, exposed as “pepper,” supports print-to-PDF and render-to-image, can convert SVG and Markdown to PDF, and runs in multiple modes: piped, file-based, or microservice style.

Impact & takeaways ⚙️

Paper‑Muncher targets the specific needs of business printing: fast, deterministic, low‑memory rendering from standard HTML/CSS without the baggage of a full interactive browser. Early benchmarks show significant CPU gains over wkhtmltopdf and even over Chromium on the same tasks; memory usage is already better than wkhtmltopdf for small documents, with optimization ongoing.

Operationally, the design eliminates fragile temp‑file choreography, avoids asset-fetch deadlocks, and improves security through sandboxing. Developers keep using familiar HTML/CSS templates, preserving existing investments. In demos, PDFs generate nearly instantly compared to wkhtmltopdf, with some alpha‑stage layout quirks still being ironed out.

Current status: alpha quality. Coverage on Web Platform Tests (WPT) is ~20% (focusing on features like flexbox, tables, SVG, z-index). Right‑to‑left (RTL) language support is in progress; page breaks already work; CSS Grid and PDF form fields are planned; JavaScript is under consideration (lower priority); Windows support and better dev tooling are on the roadmap. Tentative Odoo availability is targeted around Odoo 20–21 (not guaranteed for 19). The project welcomes community feedback and contributions via Discord and GitHub. 💬

PART 2 — Viewpoint: Odoo Perspective

⚠️ Disclaimer: AI-generated creative perspective inspired by Odoo’s vision.

Printing has been a thorn in the side of otherwise smooth flows. Our principle is to keep things simple for users and powerful for developers. With Paper‑Muncher, we keep HTML/CSS, remove the hacks, and deliver speed and predictability—without asking the community to rewrite their reports.

The elegance here is not just performance; it’s the pipeline. Streaming over pipes, sandboxing by default, and a canvas abstraction mean we can improve iteratively and stay integrated with the rest of Odoo. It’s the kind of pragmatic innovation we like: invisible when it works, and transformative at scale.

PART 3 — Viewpoint: Competitors (SAP / Microsoft / Others)

⚠️ Disclaimer: AI-generated fictional commentary. Not an official corporate statement.

Building a bespoke print engine is ambitious. Odoo’s approach aligns well with SMB and mid-market expectations for speed and developer friendliness. The adherence to web standards minimizes retraining and migration costs—smart given the volume of HTML/CSS templates in the ecosystem. The sandboxing story is also prudent.

For large enterprises, long-term questions remain: comprehensive standards compliance (e.g., accessibility tagging, PDF/A), digital signatures and forms, deterministic behavior across versions, and Windows parity. Adding JavaScript could challenge determinism and resource control. If Odoo rounds out RTL, Grid, forms, and governance features while keeping the UX advantage, Paper‑Muncher will narrow the gap with established enterprise print stacks.

Disclaimer: This article contains AI-generated summaries and fictionalized commentaries for illustrative purposes. Viewpoints labeled as "Odoo Perspective" or "Competitors" are simulated and do not represent any real statements or positions. All product names and trademarks belong to their respective owners.

Share this post
Archive
Sign in to leave a comment
Database autopsy: A performance post-mortem