Mimi Codec

live

Audio codec that splits speech into semantic and acoustic streams

web•April 20, 2026

AIAudio ProcessingVoice Technology

What It Does

Details

Mimi takes a 24 kHz audio waveform and converts it into 32 token streams. The first stream captures phonetic content (what is being said), while the remaining streams carry acoustic details such as timbre and texture. This split allows users to manipulate and control different aspects of audio independently.

Who It's For

Best fit users

•Audio researchers
•AI model developers
•Voice processing professionals

Why It Matters

Why this one made the cut

This separation of semantic and acoustic information in audio allows for more nuanced manipulation and understanding of voice content. It enables features like selective encoding/decoding of speech characteristics, which has implications for voice synthesis and real-time voice processing applications.

Differentiator

What makes it different

Mimi's unique approach to automatically separating semantic and acoustic components through training rather than manual coding enables a new level of fine-grained audio manipulation.

Sources

Where we found it

Sources

GLOBAL · Hacker NewsEN— Apr 20, 2026Visit →

First discovered Apr 20, 2026 · Hacker News