Look who’s talking: Google sets sights on voices in the crowd

23 April 2018

Start the conversation

Jeff Dunn* says Google researchers have developed a system to help computers identify and isolate individual voices within a noisy environment.

Google researchers have developed a deep-learning system designed to help computers better identify and isolate individual voices within a noisy environment.

As noted in a post on the company’s Google Research Blog last week, a team within the tech giant attempted to replicate the “cocktail party effect”, or the human brain’s ability to focus on one source of audio while filtering out others — just as you would while talking to a friend at a party.

Google’s method uses an audiovisual model, so it is primarily focused on isolating voices in videos.

The company posted a number of YouTube videos showing the tech in action.

The company says this tech works on videos with a single audio track and can isolate voices in a video algorithmically, depending on who’s talking, or by having a user manually select the face of the person whose voice they want to hear.

Google says the visual component here is key, as the tech watches for when a person’s mouth is moving to better identify which voices to focus on at a given point and to create more accurate individual speech tracks for the length of a video.

According to the blog post, the researchers developed this model by gathering 100,000 videos of “lectures and talks” on YouTube, extracting nearly 2,000 hours worth of segments from those videos featuring unobstructed speech, then mixing that audio to create a “synthetic cocktail party” with artificial background noise added.

Google then trained the tech to split that mixed audio by reading the “face thumbnails” of people speaking in each video frame and a spectrogram of that video’s soundtrack.

The system is able to sort out which audio source belongs to which face at a given time and create separate speech tracks for each speaker.

Whew.

Google singled out closed-captioning systems as one area where this system could be a boon, but the company says it envisions “a wide range of applications for this technology” and that it is “currently exploring opportunities for incorporating it into various Google products.”

Hangouts and YouTube seem like two easy places to start.

It’s not hard to see how the tech could work when applied to a pair of smart glasses, à la Google Glass, and voice-amplifying earbuds, either.

Aiding smart speakers like the Google Home in their ability to recognise individual voices seems like another use case, but because this model is focused on video, it would likely work better with a speaker with a display, like Amazon’s Echo Show.

Earlier this year, Google opened up the Google Assistant to “smart display” devices like the Echo Show, but the company hasn’t released one itself.

In any case, the privacy ramifications of this kind of tech seem just as obvious as the potential use cases.

Google’s voice isolation is far from bulletproof in the examples above, but with some more fine-tuning, it could make for a powerful eavesdropping and surveillance tool in the wrong hands.

That’s a lot of speculation for now, though.

Here’s hoping this research at least lessens the need to shout at Google Home in the future.

* Jeff Dunn is a tech reporter for Ars Technica in New York City. He tweets at @deffjunn.

This article first appeared at arstechnica.com.

Defence opens up direct-entry recruitment for space operations roles

NACC Inspector launches another investigation into its Commissioner

Victoria launches trial of digital birth certificates in three LGAs

Defence opens up direct-entry recruitment for space operations roles

ANAO reveals more than $5 billion in incorrect administration of age pension payments

Marles coy on next Defence secretary appointment

Defence opens up direct-entry recruitment for space operations roles

Latest job vacancies in the Public Sector

Four steps to setting a new career course

The 'Coalition' is back together and they're all friends again ... and they've got the paperwork to prove it

There's been more than enough time for the Opposition to get its act together

How do we actually get the Territory out of this fiscal pickle?

Federal Government partners with Hyundai to offer lower interest rates for EV purchases

Death comes with its own admin - here's how to handle it

Lonely Planet’s Best Beaches Australia

Upcoming Public Sector events

Upcoming Public Sector events

Australia’s ‘other royal family’ to make official state visit in March

Modern living, incredible views and plenty of room to grow ... Welcome to Pine Hill

The view is just the beginning of this extraordinary Kangaroo Valley lakehouse

Reserve Bank hikes interest rates for first time in more than two years

Look who’s talking: Google sets sights on voices in the crowd

Start the conversation

Look who’s talking: Google sets sights on voices in the crowd

Subscribe to PS News

Start the conversation

What's Trending

Related Stories

Be among the first to get all the Public Sector and Defence news and views that matter.