On Thursday, Google Cloud announced many improvements to the platform’s AI-powered speech tools.
Google Cloud took the decision to update its Text-to-Speech products by providing additional voices and languages to it, including beta support for new languages or variants, including Danish, Norwegian Bokmål, Polish, Portuguese/Portugal, Russian, Slovakian, and Ukrainian, making the product support a total of 21 languages as of now.
Moreover, the product now supports a total of 106 voices after adding 31 new WaveNet voices and 24 new standard voices. This makes Amazon Web Services’ Polly, which supports a total of 58 voices, the primary competition for Google’s Text-to-Speech services.
Thanks to unique access to WaveNet technology powered by Google Cloud TPUs, we can build new voices and languages faster and easier than is typical in the industry. – Dan Aharon, Google product manager
To help users enhance audio playback on various hardware, like headphones for podcasts, Google Cloud’s latest update includes the general availability of Google’s Text-to-Speech Device Profiles feature.
Google Cloud also improved its speech-to-Text transcription tools’ general availability and quality.
It announced the general availability of multi-channel recognition enabling Speech-to-Text API distinction between multiple audio channels, which would come in handy in situations involving multiple people.
Last year, Google produced beta-accessible premium models for video and enhanced phone, which are now generally available. Data logging for premium-services customers in order to share usage data was made use of by Google to improve its video and phone models.
Google announced the improved video model to have 64 percent fewer transcription errors and the phone model to have 62 percent fewer errors.
Prices for the premium phone and video models have been slashed. The upgraded phone and video models can be used without opting for data logging, but opting for data logging would cost customers less for the products.
Through these updates, developers would benefit in building intelligent voice applications that can reach a wider audience along with providing greater efficiency and functionality.