Mozilla has released a large set of voice data as part of its Common Voice program. The open-source collection of transcribed recordings and metadata is offered by Mozilla to designers of voice apps and voice-enabled devices.
The collection has grown significantly over the last year and a half. From 1,400 hours in 18 languages in February 2019, Common Voice now has 7,226 hours in 54 languages, a total of 5.5 million clips. Along with voice recordings, users can access information about gender, accent, and age to use in their designs. The data is supposed to be used with Mozilla’s DeepSpeech toolkit of voice and text models. DeepSpeech had its own update recently to improve the speed of speech recognition and support for Google’s TensorFlow Lite framework. The new collection also has Mozilla’s first dataset target segment of voice clips for specific cases. Words like Yes, no, hey, firefox, and every digit from zero to nine have been recorded by 11,000 people in 18 languages, for 120 hours of clips.
“With contributions from all over the globe, you are helping us follow through on our goal to create a voice dataset that is publicly available to anyone and represents the world we live in,” Common Voice product and design lead Megan Branson wrote in the announcement. “This segment data will help Mozilla benchmark the accuracy of our open source voice recognition engine, Deep Speech 226, in multiple languages for a similar task and will enable more detailed feedback on how to continue improving the dataset.”
The words chosen for the targeted dataset will likely be used in Firefox Voice, the browser extension currently in beta that offers voice controls for the web browser. The voice assistant is limited to very basic questions and commands, but more data will help it reach full functionality. The tool is limited to the desktop version of the browser and only works in English for now. Firefox Voice uses Google Cloud Speech Service, but Mozilla may want to switch to an in-house service instead.
Mozilla’s interest in building a voice AI ecosystem is not unique, and it’s another aspect of voice technology where competition among companies will arise. For instance, Google had offered voice search in its Chrome browser but is already replacing it with Google Assistant. Mozilla’s voice assistant is a defense against losing users to Google as well as a benefit to current users. Both Mozilla and Google will have to contend with voice assistants built into websites, such as the WordPress plugin designed by speak2web for people to search and shop by voice within websites, as well as mobile apps. Still, the rapid addition of new voice data, with more languages and types of speakers could give Mozilla an edge as the competition grows.