Skip to Content

Unveiling Paper-Muncher: The secret of web engines and understanding large document generation

Duration: 27:16


PART 1 — Analytical Summary 🚀

Context 💼
This talk introduces Paper‑Muncher, an in‑house HTML/CSS‑to‑PDF engine developed at Odoo. The speaker, Lou (software engineer on Odoo’s Paper‑Muncher team), explains why Odoo invested two years with a very small team (2–4 engineers) to build a dedicated web engine: generating extremely large, legally required PDFs (e.g., Mexico’s full general ledger) where volumes can reach 20,000 pages. Existing tools struggled with performance, reliability, and modern styling, turning a routine back‑office need into an operational bottleneck.

Core ideas & innovations ⚙️
Odoo evaluated mainstream options. Chrome (headless) is beautiful and secure but not optimized for static mega‑documents and tends to fail past a few hundred pages. wkhtmltopdf (Odoo’s long‑time choice) is slow, crashes on large files, and is stuck on an old Safari/WebKit stack with outdated CSS features. WeasyPrint is modern and easy, but too slow for this scale. PrinceXML is both performant and visually strong, but proprietary and expensive—enough to justify building a replacement over a few years.

The team chose to build a dedicated rendering engine—optimized purely for static document generation—rather than repurpose a generalist browser engine. Lou demystifies how a web engine works: fetch HTML/CSS, parse to structured representations, match CSS rules to elements, compute layout (pagination, line breaks, positions/sizes), then “paint” to a target output (PDF, image, or a dev‑tools window). The emphasis is on a clean, maintainable pipeline rather than browser‑level interactivity. A few notable technical choices stand out: writing in a language the team already masters (C++ instead of Rust) to move fast; following web standards meticulously (no proprietary spec); leveraging open‑source engine code to learn from prior art; and designing a CSS subsystem that builds an additional pre‑syntax tree to better support both legacy and future CSS versions. Memory is handled pragmatically: early pipeline stages stream content; later stages hold the structural representation needed for correct styling and layout.

Impact & takeaways 🧠
The result is a significantly faster renderer than wkhtmltopdf (the only chart shown underscored a large gain), designed to survive and thrive on multi‑thousand‑page documents. It promises modern CSS, predictable performance, and fewer crashes—plus a path to richer outputs (images, on‑screen dev tools, and later interactive PDF forms and standards like PDF/A). The project’s delivery strategy is thoughtful: first prove the architecture, then optimize speed, then integrate with Odoo documents for real‑world polish, and finally fill out edge CSS features. Tooling matters: by wiring Web Platform Tests (WPT) into their workflow, the team measures CSS compliance, diagnoses regressions quickly, and compares behavior against other engines. Current status: an open‑source alpha on GitHub; a limited rollout is targeted for Odoo 19.1; “near‑perfect” maturity is expected in about two years. The practical takeaway is clear—by focusing on the essential subset (static PDFs), respecting standards, and planning ahead, a small team can ship a specialized engine that outperforms generalist giants on a critical business job. 💬

PART 2 — Viewpoint: Odoo Perspective

Disclaimer: AI-generated creative perspective inspired by Odoo's vision.

When a foundational piece of your platform slows customers down, you have to own it. Paper‑Muncher is about reclaiming speed and reliability for an everyday workflow—printing invoices, ledgers, and reports—without asking users to compromise on design. Simplicity here means a focused engine that does one thing incredibly well, integrated seamlessly across Odoo.

We chose standards, not shortcuts. By aligning with WPT and building a future‑proof CSS pipeline, we give our community a transparent way to measure progress and contribute. The long game is to lower complexity and cost for everyone—open tools, tight integration, and performance you can trust for 20,000 pages or two.

PART 3 — Viewpoint: Competitors (SAP / Microsoft / Others)

Disclaimer: AI-generated fictional commentary. Not an official corporate statement.

Odoo’s decision to build a dedicated HTML‑to‑PDF engine addresses a real pain point for high‑volume, compliance‑driven output. Specialized rendering can deliver deterministic performance and modern styling without the overhead of a full browser stack. For customers with huge statutory reports and batch runs, that’s compelling.

The challenge will be breadth and depth: comprehensive CSS coverage; PDF standards such as PDF/A, forms, signatures, and accessibility; and the operational rigor enterprises expect (security hardening, telemetry, HA workflows). As the product matures, differentiation will hinge on predictable scalability, compliance assurance, and a streamlined UX that reduces document‑design time. If Odoo executes, this could meaningfully raise the bar for integrated document generation.

Disclaimer: This article contains AI-generated summaries and fictionalized commentaries for illustrative purposes. Viewpoints labeled as "Odoo Perspective" or "Competitors" are simulated and do not represent any real statements or positions. All product names and trademarks belong to their respective owners.

Share this post
Archive
Sign in to leave a comment
From Shopify to Odoo: Embracing Headless E-Commerce for Scalable Growth