Version 2 of Higgs Audio Generation

  • Higgs Audio V2 is an advanced, open-source audio generation model developed by Boson AI, designed to produce highly expressive and lifelike speech with robust multi-speaker dialogue capabilities.

    Some Highlights:

    * Trained on 10M hours of diverse audio — speech, music, sound events, and natural conversations

    * Built on top of Llama 3.2 3B for deep language and acoustic understanding

    * Runs in real-time and supports edge deployment — smallest versions run on Jetson Orin Nano

    * Outperforms GPT-4o-mini-tts and ElevenLabs v2 in prosody, emotional expressiveness, and multi-speaker dialogue

    * Zero-shot natural multi-speaker dialogues — voices adapt tone, energy, and emotion automatically

    * Zero-shot voice cloning with melodic humming and expressive intonation — no fine-tuning needed

    * Multilingual support with automatic prosody adaptation for narration and dialogue

    * Simultaneous speech and background music generation — a first for open audio foundation models

    * High-fidelity 24kHz audio output for studio-quality sound on any device

    * Open source and commercially usable — no barriers to experimentation or deployment

    Model on Huggingface: https://huggingface.co/bosonai/higgs-audio-v2-generation-3B-...