Some time ago Meta announced its newest Generative AI which they call Voicebox, this AI is designed to help creator with the ability to perform manufacturing tasks speech generation tasks such as Audio Editing, Sampling and even Stylising.
According to Meta, this new AI model will benefit many people around the world, such as helping blind people to hear text messages from friends with their voice, as well as allowing users to speak foreign languages with their own voice.
Interestingly, the AI Model itself can generate high-quality Click Audio, and edit pre-recorded audio to remove unwanted distractions such as horns and other Noise while preserving the content and style of the audio.
In comparison, Meta even compared the Voicebox to other audio AI models out there, specifically naming the Vall-E and YourTTS as competitors, where they demonstrated that the Voicebox is more advanced and outperforms both models when comparing error rates and stylistic similarities.
In addition, according to Meta, this Voicebox has been built on the Flow Matching model, which is Meta’s latest generative non-autoregressive model, where this technology can learn a very non-deterministic mapping between text and speech thus enabling Voicebox to learn from various speech data without should be labeled so that the data becomes more diverse and on a larger scale.
This voicebox is said to have been trained on more than 50,000 hours of speech recordings and transcripts from audiobooks public domain in English, French, Spanish, German, Polish and Portuguese so far, and can also predict speech segments when given the speech around them and their transcripts from the segments.
Even so, unfortunately Meta currently plans not to make this AI program for all users, and will not even release the source code, for more details about this maybe you can just check on the page followingin the future Meta will also provide additional announcements regarding Voicebox.
"There are many exciting use cases for generative speech models, but because of the potential risks of misuse, we are not making the Voicebox model or code publicly available at this time. While we believe it is important to be open with the AI community and to share our research to advance the state of the art in AI, it’s also necessary to strike the right balance between openness with responsibility. With these considerations, today we are sharing audio samples and a research paper detailing the approach and results we have achieved. In the paper, we also detail how we built a highly effective classifier that can distinguish between authentic speech and audio generated with Voicebox." ungkap Meta.
So what do you think about this service? are you interested in trying?