Matt Day, Giles Turner and Natalia Drozdiak* say a global team of reviewers listens to voice recordings captured by Amazon’s voice-activated assistant.

Photo: Gilaxia
Tens of millions of people use smart speakers and voice software to play games, find music or trawl for trivia.
Millions more are reluctant to invite the devices and their powerful microphones into their homes out of concern that someone might be listening.
Sometimes, someone is. Inc. employs thousands of people around the world to help improve the Alexa digital assistant powering its line of Echo speakers.
The team listens to voice recordings captured in Echo owners’ homes and offices.
The recordings are transcribed, annotated and then fed back into the software as part of an effort to eliminate gaps in Alexa’s understanding of human speech and help it better respond to commands.
The Alexa voice review process, described by seven people who have worked on the program, highlights the often-overlooked human role in training software algorithms.
In marketing materials Amazon says Alexa “lives in the cloud and is always getting smarter.”
But like many software tools built to learn from experience, humans are doing some of the teaching.
The team comprises a mix of contractors and full-time Amazon employees, according to the people, who signed nondisclosure agreements barring them from speaking publicly about the program.
They work nine hours a day, with each reviewer parsing as many as 1,000 audio clips per shift, according to two workers based at Amazon’s Bucharest office.
The work is mostly mundane.
One worker in Boston said he mined accumulated voice data for specific utterances such as “Taylor Swift” and annotated them to indicate the searcher meant the musical artist.
Occasionally the listeners pick up things Echo owners likely would rather stay private: a woman singing badly off key in the shower, say, or a child screaming for help.
The teams use internal chat rooms to share files when they need help parsing a muddled word — or come across an amusing recording.
Sometimes they hear recordings they find upsetting, or possibly criminal.
Two of the workers said they picked up what they believe was a sexual assault.
When something like that happens, they may share the experience in the internal chat room as a way of relieving stress.
Amazon says it has procedures in place for workers to follow when they hear something distressing, but two Romania-based employees said that, after requesting guidance for such cases, they were told it wasn’t Amazon’s job to interfere.
“We take the security and privacy of our customers’ personal information seriously,” an Amazon spokesman said in an emailed statement. “We only annotate an extremely small sample of Alexa voice recordings in order [to] improve the customer experience.”
“We have strict technical and operational safeguards, and have a zero tolerance policy for the abuse of our system.”
“Employees do not have direct access to information that can identify the person or account as part of this workflow.”
Amazon, in its marketing and privacy policy materials, doesn’t explicitly say humans are listening to recordings of some conversations picked up by Alexa.
“We use your requests to Alexa to train our speech recognition and natural language understanding systems,” the company says.
In Alexa’s privacy settings, the company gives users the option of disabling the use of their voice recordings for the development of new features.
A screenshot reviewed by Bloomberg shows that the recordings sent to the Alexa auditors don’t provide a user’s full name and address but are associated with an account number, as well as the user’s first name and the device’s serial number.
The Intercept reported earlier this year that employees of Amazon-owned Ring manually identify vehicles and people in videos captured by the company’s doorbell cameras in an effort to better train the software to do that work itself.
“You don’t necessarily think of another human listening to what you’re telling your smart speaker in the intimacy of your home,” said Florian Schaub, a Professor at the University of Michigan who has researched privacy issues related to smart speakers.
“We’ve been conditioned to the [assumption] that these machines are just doing magic machine learning.”
“But the fact is there is still manual processing involved.”
When the Echo debuted in 2014, Amazon’s cylindrical smart speaker quickly popularised the use of voice software in the home.
Before long, Alphabet Inc. launched its own version, called Google Home, followed by Apple Inc.’s HomePod.
Globally, consumers bought 78 million smart speakers last year, according to researcher Canalys.
Millions more use voice software to interact with digital assistants on their smartphones.
Most modern speech-recognition systems rely on neural networks patterned on the human brain.
The software learns as it goes, by spotting patterns amid vast amounts of data.
The algorithms powering the Echo and other smart speakers use models of probability to make educated guesses.
But sometimes Alexa gets it wrong — especially when grappling with new slang, regional colloquialisms or languages other than English.
That’s why Amazon recruited human helpers to fill in the gaps missed by the algorithms.
Apple’s Siri also has human helpers, who work to gauge whether the digital assistant’s interpretation of requests lines up with what the person said.
The recordings they review lack personally identifiable information and are stored for six months tied to a random identifier, according to an Apple security white paper.
At Google, some reviewers can access some audio snippets from its Assistant to help train and improve the product, but it’s not associated with any personally identifiable information and the audio is distorted, the company says.
Some Alexa reviewers note everything the speaker picks up, including background conversations — even when children are speaking.
Sometimes they hear users discussing private details such as names or bank details; in such cases, they’re supposed to tick a dialogue box denoting “critical data.”
They then move on to the next audio file.
According to Amazon’s website, no audio is stored unless Echo detects the wake word or is activated by pressing a button.
But sometimes Alexa appears to begin recording without any prompt at all, and the audio files start with a blaring television or unintelligible noise.
Either way, the reviewers are required to transcribe it.
One of the people said the auditors each transcribe as many as 100 recordings a day when Alexa receives no wake command or is triggered by accident.
In homes around the world, Echo owners frequently speculate about who might be listening, according to two of the reviewers.
“Do you work for the National Security Agency?” they ask.
“Alexa, is someone else listening to us?”
— With assistance by Gerrit De Vynck, Mark Gurman, and Irina Vilcu
* Matt Day is a reporter for Bloomberg Technology who tweets at @mattmday.
Giles Turner is European Editor for Bloomberg Technology and he tweets at @turnergs.
Natalia Drozdiak is Bloomberg’s European technology reporter. She tweets at @nat_droz.
This article first appeared at