Transcription across Odoo apps: Exploring speech-to-text features

Duration: 36:48

PART 1 — Analytical Summary 💼

Context: Who’s speaking and why it matters

In this Odoo Experience session, two Odoo engineers, Dylan and Andre, unveil new speech-to-text capabilities shipping with Odoo 19. The talk walks through live dictation across apps, real-time transcription architecture, automatic meeting summaries, and how VoIP call recordings are now transcribed and summarized in the background. It’s important because it turns spoken interactions—calls, meetings, demos—into structured, searchable knowledge directly inside Odoo, without third‑party workflow glue.

Core ideas and innovations 🚀

At the heart of the release is a reusable Voice Transcription component that works in any HTML field in Odoo. You can dictate in real time during meetings, phone calls, or while writing documentation, add context notes for better accuracy, then instantly generate an AI summary with one-click rewrite actions. The summary can be sent via the built‑in mail composer and stored in the chatter on the related record. The component supports multiple languages available in your database, and team members can co-edit the text and notes during the session.

Under the hood, Odoo evolved from a “pseudo” real-time approach to a truly real-time streaming implementation. The earlier method chunked audio in the browser, sent it via the Odoo server to OpenAI Whisper, and stitched responses back in the UI—clever but complex, with ordering and latency pitfalls. The new design establishes a direct WebSocket session to OpenAI’s real-time API using short-lived ephemeral tokens (fetched via Odoo for security), sends audio continuously, and renders live events from the stream. Locally, the client performs audio filtering and converts input to compliant PCM16 mono; Odoo also sets turn-detection (e.g., ~500 ms silence) and noise reduction preferences to improve results. The UI reacts to streaming events (listening, delta/word updates, completion, stop), providing a reliable, ordered feed of text.

For VoIP calls, the pattern is different by design. Odoo captures both incoming and outgoing streams, merges and buffers the entire call locally, and only uploads after hang-up. An asynchronous background worker (implemented via cron) requests a full transcript and then a full summary to keep the UI responsive. The pipeline uses simple, defensive states to scale and avoid race conditions: no audio, pending (awaiting processing), queued (locked for a worker), error conditions (e.g., provider unavailable), and “too long to process” safeguards for extreme durations. The architecture is provider‑agnostic today—recording happens in the browser—so it works regardless of telephony provider such as Aircall. Longer-term, Odoo may centralize this with its own provider to increase robustness.

Real-time vs. pseudo real-time 🧠

The shift to real-time streaming trades “constant partial text flicker” for stability and correctness. With the new approach, you typically see finalized chunks after a pause, not every micro-update as you speak. In return, Odoo gets ordered events, fewer asynchronous glitches, and no server hop for audio data. The cost is a slightly more complex session setup and a less “hyper-reactive” UI, but fewer disconnections and significantly cleaner results.

Limitations, privacy, and roadmap 💬

A few practical notes emerged in Q&A. Live dictation sessions are created with a specific language; mid‑session switching is not supported, though the model often still recognizes or translates content. Speaker diarization (e.g., “Person 1 / Person 2”) isn’t present in live dictation; it’s more feasible in VoIP because there are two clear streams. Uploading standalone MP3s for offline transcription is not supported by the dictation component today. Recordings are currently stored indefinitely; Odoo plans to add retention controls and welcomes feedback on privacy options such as on‑prem models or customer-managed providers. Recordings can be manually deleted in attachments. The team is exploring semantic search across transcripts, “moments” style call analysis (targeted highlights instead of blunt sentiment tallies), integration with Discuss, and even voice commands to trigger actions or natural-language queries.

Impact & takeaways ⚙️

This release removes friction between conversations and knowledge capture inside Odoo. Meetings can be dictated directly into Calendar events or Knowledge articles; calls are transcribed and summarized automatically; summaries can be customized with default prompts from the AI app and shared in one click. The engineering shift to real-time streaming delivers a more robust experience, while VoIP’s asynchronous, cron‑backed pipeline scales cleanly without blocking users. It’s a pragmatic balance of simplicity, accuracy, and performance, with a clear path to deeper analytics and enterprise controls as customers weigh in.

PART 2 — Viewpoint: Odoo Perspective

Disclaimer: AI-generated creative perspective inspired by Odoo's vision.

When we talk about simplicity at Odoo, we mean reducing the number of tools you need to get work done. Speech-to-text inside every HTML field is a small UI decision with a big impact: your meetings, calls, and notes become structured content that flows through the same integrated stack.

Real-time streaming and background workers are not just engineering feats; they’re choices to keep the experience smooth and predictable. As the community shares feedback on privacy, on‑prem models, and new use cases like voice commands, we’ll keep pushing the boundaries—always prioritizing integration and ease of use.

PART 3 — Viewpoint: Competitors (SAP / Microsoft / Others)

Disclaimer: AI-generated fictional commentary. Not an official corporate statement.

Odoo’s speech-to-text strategy is impressively integrated and fast-moving. The decision to embed dictation into any field and to summarize VoIP calls as part of core workflows is a strong UX differentiator. The real-time pipeline and ephemeral tokens reflect solid technical hygiene.

Challenges remain where large enterprises scrutinize scalability, data-retention, and sovereignty. Robust speaker attribution, domain-specific accuracy, and compliance frameworks (e.g., model choice, residency, audit trails) will determine adoption in regulated sectors. The direction is right; the depth of governance and analytics will be the next battleground.

Disclaimer: This article contains AI-generated summaries and fictionalized commentaries for illustrative purposes. Viewpoints labeled as "Odoo Perspective" or "Competitors" are simulated and do not represent any real statements or positions. All product names and trademarks belong to their respective owners.

Transcription across Odoo apps: Exploring speech-to-text features

PART 1 — Analytical Summary 💼

Context: Who’s speaking and why it matters

Core ideas and innovations 🚀

Real-time vs. pseudo real-time 🧠

Limitations, privacy, and roadmap 💬

Impact & takeaways ⚙️

PART 2 — Viewpoint: Odoo Perspective

PART 3 — Viewpoint: Competitors (SAP / Microsoft / Others)

Share this post

Tags

Our blogs

Archive

Follow us

APPS

SUPPORT

DEVELOPERS

SOCIAL

SOLUTIONS

PARTNERS

SUPPORT

Transcription across Odoo apps: Exploring speech-to-text features

PART 1 — Analytical Summary 💼

Context: Who’s speaking and why it matters

Core ideas and innovations 🚀

Real-time vs. pseudo real-time 🧠

Limitations, privacy, and roadmap 💬

Impact & takeaways ⚙️

PART 2 — Viewpoint: Odoo Perspective

PART 3 — Viewpoint: Competitors (SAP / Microsoft / Others)

Share this post

Tags

Our blogs

Archive