Voice clips are audio recordings of what you say when you use your voice to interact with Microsoft products and services. When you use your voice to interact with Microsoft products, Microsoft speech recognition technology automatically generates the audio of what you say into words so that the Microsoft services can respond—for example, when you use Cortana to set a reminder, speech recognition technology is used to translate your command into text so that Cortana can set it for you. Microsoft's new opt-in voice settings allow you to give Microsoft permission to sample and listen to these voice clips to improve Microsoft speech recognition technology.
Microsoft uses voice clips to help train their speech recognition technology to be better, more accurate, and more precise for you and everyone who speaks your language. For example, your everyday use of their voice-enabled products helps their speech recognition models to learn and better recognize complex and nuanced aspects of how people talk—like accents or regional dialects, and how sentences are structured in different languages. Sampling voice clips also helps Microsoft make their technology better at understanding speech in different acoustic settings—like when there’s a lot of ambient noise versus when things are quiet. These improvements allow Microsoft to build better voice-enabled capabilities that benefit users across all Microsoft products and services.
Microsoft is rolling out updates to its user consent experience for voice data to give customers more meaningful control over whether their voice data is used to improve products, the company announced Friday 1/15/2021. These updates let customers decide if people can listen to recordings of what they said while speaking to Microsoft products and services that use speech recognition technology.
If customers choose to opt in, people may review these voice clips to improve the performance of Microsoft’s artificial intelligence systems across a diversity of people, speaking styles, accents, dialects and acoustic environments. The goal is to make Microsoft’s speech recognition technologies more inclusive by making them easier and more natural to interact with, the company said.
Customers who do not choose to contribute their voice clips for review by people will still be able to use all of Microsoft’s voice-enabled products and services.
Voice clips are audio recordings of what users said when they used their voice to interact with voice-enabled products and services, such as dictating a translation request or a web search.
Microsoft removes certain personal information from voice clips as they are processed in the cloud, including Microsoft account identifiers and strings of letters or numbers that could be telephone numbers, Social Security numbers and email addresses.
The new settings for voice clips mean that customers must actively choose to allow people to listen to the recordings of what they said. If they do, Microsoft employees and people contracted to work for Microsoft may listen to these voice clips and manually transcribe what they hear as part of a process the company uses to improve AI systems.
“Their transcription is what we consider our ground truth of what was actually spoken inside that audio clip. We use that as a basis for comparison to identify where our AI needs improvement,” said Neeta Saran, a senior attorney at Microsoft in Redmond, Washington.
The more transcripts Microsoft has of how real people talk from contributed voice clips, the better these AI systems will perform.
These new settings for voice clips are designed to give customers meaningful consent for people to listen to what they said while interacting with Microsoft products and services, including increased awareness of who their voice clips are being shared with and how they are being used.
“This new meaningful consent release is about making sure that we’re transparent with users about how we are using this audio data to improve our speech recognition technology,” Saran said.
Because Microsoft removes account identifiers from the voice clips as they are processed, they will no longer show up in the privacy dashboard of customers’ Microsoft accounts, the company said.
Microsoft does not use any human reviewers to listen to audio data collected from speech recognition features built into enterprise offerings, the company added.
Data retention and next steps
On Oct. 30, 2020, Microsoft stopped storing voice clips processed by its speech recognition technologies. Over the next few months, the company is rolling out the new settings for voice clips across products including Microsoft Translator, SwiftKey, Windows, Cortana, HoloLens, Mixed Reality and Skype voice translation.
If a customer chooses to let Microsoft employees or contractors listen to their voice recordings to improve AI technology, the company will retain all new audio data contributed for review for up to two years. If a contributed voice clip is sampled for transcription by people, the company may retain it for more than two years to continue training and improving the quality of speech recognition AI.
“The more diverse ground truth data that we are able to collect and use to update our speech models, the better and more inclusive our speech recognition technology is going to be for our users across many languages,” Saran said.
- Microsoft gives users control over their voice clips | Microsoft | The AI Blog
- How does Microsoft protect my privacy while improving its speech recognition technology? | Microsoft Support
This is currently only available starting with Windows 10 build 21292.
EXAMPLE: How does Microsoft protect my privacy while improving its speech recognition technology?
Start or Stop Contributing Voice Clips to Microsoft in Settings
Open Settings, and click/tap on the Privacy icon.
Click/tap on Speech on the left side, and click/tap on Start contributing my voice clips or Stop contributing my voice clips (default) on the right side under Help make online speech recognition better. (see screenshots below)
It is required to have online speech recognition turned on to have the Help make online speech recognition better setting available.
You can now close Settings if you like.
Start or Stop Contributing Voice Clips to Microsoft using a REG file
The downloadable .reg files below will modify the DWORD value in the registry key below.
0 = stop
1 = start
Do step 2 (start) or step 3 (stop) below for what you would like to do.
This step will also turn on online speech recognition.
This is the default setting.
Save the .reg file to your desktop.
Double click/tap on the downloaded .reg file to merge it.
When prompted, click/tap on Run, Yes (UAC), Yes, and OK to approve the merge.
You can now delete the downloaded .reg file if you like.