Meta's Audiobox: AI Sound with Voice and Text

Meta's Audiobox: AI Sound with Voice and Text

By Rahul Bhagat

Meta has launched Audiobox, an AI sound generation model that accepts simultaneous voice and text input.

Audiobox can create various environmental sounds and natural conversational speech based on the Voicebox AI model.

The model integrates audio generation and editing features, allowing users to generate customized audio quickly.

Meta aims to lower the barrier to sound generation by providing a tool accessible to the public for creating videos, games, and more.

Audiobox utilizes Voicebox's "guided sound" mechanism and the "flow-matching" diffusion model for multi-layered audio generation.

In tests, Audiobox outperformed AudioLDM2, VoiceLDM, and TANGO in sound quality and the "accuracy of generated content," according to Meta.