ARS Technica detailed a recent post on the Google Research Blog revealing a major advancement in voice recognition software.
This new technology is actually AI algorithm that has been trained to hone in on voices similar to the way the human ear can. Unlike other voice recognition software, this AI is not built to make transcriptions or translate voice into another language. It’s built so that you could parse out a single person’s speech clearly in a crowded space.
By studying thousands of hours of video the AI has learned to use a combination of speech pattern recognition and visual speech recognition. Meaning that the AI actually watches which person in the video who is talking, then assigns the correct sound signatures to that person so you can switch between them, lowering one person audio and maker the other’s speech significantly more clear.
It means this technology won’t easily allow you to pull audio out of a crowded room without line of sight, but one day we could see something like this used to accurately detect speech that would have otherwise been incoherent. Regardless of silly, nefarious, or theoretical guesses as to its application, I have to say this, it’s pretty darn cool.