Microsoft gives users control over their voice clips

Microsoft is rolling out updates to its user consent experience for voice data to give customers more meaningful control over whether their voice data is used to improve products, the company announced Friday. These updates let customers decide if people can listen to recordings of what they said while speaking to Microsoft products and services that use speech recognition technology.

If customers choose to opt in, people may review these voice clips to improve the performance of Microsoft’s artificial intelligence systems across a diversity of people, speaking styles, accents, dialects and acoustic environments. The goal is to make Microsoft’s speech recognition technologies more inclusive by making them easier and more natural to interact with, the company said.

Customers who do not choose to contribute their voice clips for review by people will still be able to use all of Microsoft’s voice-enabled products and services.

Voice clips are audio recordings of what users said when they used their voice to interact with voice-enabled products and services, such as dictating a translation request or a web search.

Microsoft removes certain personal information from voice clips as they are processed in the cloud, including Microsoft account identifiers and strings of letters or numbers that could be telephone numbers, Social Security numbers and email addresses.

The new settings for voice clips mean that customers must actively choose to allow people to listen to the recordings of what they said. If they do, Microsoft employees and people contracted to work for Microsoft may listen to these voice clips and manually transcribe what they hear as part of a process the company uses to improve AI systems.

“Their transcription is what we consider our ground truth of what was actually spoken inside that audio clip. We use that as a basis for comparison to identify where our AI needs improvement,” said Neeta Saran, a senior attorney at Microsoft in Redmond, Washington.

The more transcripts Microsoft has of how real people talk from contributed voice clips, the better these AI systems will perform.

While Microsoft employees and contractors will only listen to voice clips with user permission, the company may continue to access information associated with user voice activity, such as the transcriptions automatically generated during user interactions with speech recognition AI. The details of how that works are described in the terms of use for individual Microsoft products and services, the company said.

A graphic illustrates how the new settings for voice data will appear to users. Text boxes explain why Microsoft asks users to contribute voice clips, how user identity is protected and the people who use the contributed data.

Microsoft’s new settings for voice data will roll out to voice typing, an updated version of the Windows dictation experience. Graphic courtesy of Microsoft.

Meaningful consent

These new settings for voice clips are designed to give customers meaningful consent for people to listen to what they said while interacting with Microsoft products and services, including increased awareness of who their voice clips are being shared with and how they are being used.

“This new meaningful consent release is about making sure that we’re transparent with users about how we are using this audio data to improve our speech recognition technology,” Saran said.

Because Microsoft removes account identifiers from the voice clips as they are processed, they will no longer show up in the privacy dashboard of customers’ Microsoft accounts, the company said.

Microsoft does not use any human reviewers to listen to audio data collected from speech recognition features built into enterprise offerings, the company added.

Data retention and next steps

On Oct. 30, 2020, Microsoft stopped storing voice clips processed by its speech recognition technologies. Over the next few months, the company is rolling out the new settings for voice clips across products including Microsoft Translator, SwiftKey, Windows, Cortana, HoloLens, Mixed Reality and Skype voice translation.

If a customer chooses to let Microsoft employees or contractors listen to their voice recordings to improve AI technology, the company will retain all new audio data contributed for review for up to two years. If a contributed voice clip is sampled for transcription by people, the company may retain it for more than two years to continue training and improving the quality of speech recognition AI.

“The more diverse ground truth data that we are able to collect and use to update our speech models, the better and more inclusive our speech recognition technology is going to be for our users across many languages,” Saran said.

Related: