- Jul 24, 2021
Today, Google researchers have gone on to open-source a speech recognition data set for enhancing their DIY AI projects. This open-sourcing will give the AI-enthusiastic DIY makers more tools for creating basic voice commands for various smart devices.
The Speech Commands dataset is an accumulation of 65,000 utterances of 30 words for the training of the AI models. It is created by the AIY and Tensorflow teams of Google.
On the other hand, AIY Projects of Google was launched in May 2017. The aim of this initiative is to provide support to DIY makers who are interested in AI. The project starts launching a line of reference designs with a smart speaker in a cardboard box and speech recognition.
Related to this, a blog post was written by Pete Warden, a software engineer of Google Brain. In the post, he revealed that they also open-sourced the infrastructure that they used for creating data. They intend for the communities to use the infrastructure and create their individual versions. This will cover the undeserved applications and languages.
According to Warden, more variations and accents are constantly shared with the project. This broadens the dataset for DIY AI, which would have never been restricted within contributions by thousands only.
In fact, this DIY AI dataset allows you to add your voice to Speech Commands, unlike other datasets. All you need is to visit the AIY Projects website and go to the speech portion. After that, you will automatically be invited for short recordings of 135 simple words, and a series of numbers and names.
DIY AI Exclusions in the dataset
The concerned project isn’t yet into representing all communities in terms of gathered voice samples. Therefore, certain models may not yet understand the voice of every user. Similarly, while providing a voice command to a device, certain local dialects and slangs have remained excluded from some groups.
In regards to that, Stanford AI researchers came across an interesting stat. A language identifier NLP (neuro linguistic programming) called Equilid is trained on Urban Dictionary and Twitter. The observation states that Equilid is more accurate than the identifiers trained with texts excluding users based on race, age, and way of talking.
Even, Equilid was found more precise than CLD2 of Google as well. Further academic tests on speech recognition tools concluded that the most used NLP tools are yet to be savvy in understanding Afro-American users.