Mozilla announced today, June 7 that its Common Voice has begun going multilingual, now available in German, Welsh and French. It is presently in works to launch more than 40 languages additionally.
Common Voice, under the project Deep Speech, was launched in July 2017. Since then, Mozilla collected about a million voice samples in English via its iOS app and website. The first version of the crowdsourced dataset was launched last November. Downloaded thousands of times, it found purpose in commercial voice products, Kaldi the open-source software, and Deep Speech, Mozilla’s speech recognition engine.
The company aims to bring Common Voice for all, which means covering languages. In order to accomplish accordingly, it spent the last few months growing and supporting individual language communities around.
These communities are contributing to Mozilla’s initiative via providing copyright-free sentences to Common Voice, promoting the site in their countries, and creating communities of contributors. The objective is to ascend the total number of hours of data available in each language.
Numerous countries are already engaged in these activities, some of which are: Germany, Kenya, Indonesia, Spain, Brazil, Taiwan, Hungary, Macedonia, Slovenia, Serbia, Thailand, and Nepal.
Alongside in English, Mozilla has started collecting voice samples in the three languages. Among the other 40+ languages that are on the way are ranging from the majorly used Spanish, Russian or Chinese to the less-used Norwegian, Frisian or Chuvash. Since the smaller languages are under-served by today’s commercial speech recognition services, Mozilla is covering them via the project.
According to the company, the huge dataset in its availability can facilitate entrepreneurs and communities to individually address the existing discrimination of languages.
Mozilla invites people across the world for donating voice on the website and using its iOS app. Furthermore, its https://voice.mozilla.org/languages will help people interested to bring Common Voice and speech tech in their languages.
Source: The Mozill Blog