The Quest for the Perfect Lip-Synching Robot

Their mouths may move convincingly, but they’re far from realistic – yet
Explore
IIf we are ever going to communicate effectively with robots, they need to improve their lip syncing.
Complex mouth movements are essential for human connection, especially in noisy environments: in such environments, we look at the speaker’s mouth up to half the time.
That means they’re a key feature for robots we can chat with comfortably, but researchers have long struggled to create robots with lips that can skillfully sync to audio. Robots have mechanical constraints that limit the range of motion and speed of lip movements, for example, and they tend to lag behind commands.
ADVERTISEMENT
Nautilus members enjoy an ad-free experience. Log in or register now.
To overcome this obstacle, researchers at Columbia University in New York leveraged artificial intelligence models inspired by the human brain, known as neural networks, allowing the humanoid robot to perform fluid mouth movements that synchronize with a mixture of words.
“The ability to form complex lip shapes… improves overall speech synchronization with more detail, providing more realistic interactions that mitigate some of the risks of the uncanny valley effect,” according to a new study. Scientific robotics paper.
ADVERTISEMENT
Nautilus members enjoy an ad-free experience. Log in or register now.
The team designed a human-like robot face with soft silicone “skin.” It has magnetic connectors that allow 10 degrees of freedom, making all kinds of lip movements possible.
To train the models powering this robot, the team provided them with recordings of their robot performing various lip movements, such as those associated with rounded vowels. Then, they incorporated AI-generated videos of “ideal” lip movements for certain sentences into their models.
The system allows a robot’s lips to form the shapes associated with 24 consonants and 16 vowels, the researchers reported in the paper.
ADVERTISEMENT
Nautilus members enjoy an ad-free experience. Log in or register now.
Read more: “Deepfake Luke Skywalker should scare us”
Using these “ideal” AI videos as a baseline, they compared their new system to existing techniques used to shape the robots’ lip movements. Among all the methods, theirs had the least lag compared to the mouth movements of the AI videos. The robot was also able to convincingly pronounce 10 different languages, including Korean, French and Arabic, with varying phonetic structures, and it even did a bit of karaoke.
There’s still plenty of room for improvement, the researchers acknowledged, including incorporating more training data and adding more physical degrees of freedom. In the future, they believe their tool could be used in education and in the care of older adults suffering from cognitive decline, as it could help us connect with robots “on a human level.”
ADVERTISEMENT
Nautilus members enjoy an ad-free experience. Log in or register now.
But they also warn that the increased emotional connection with robots could “be exploited to gain the trust of unsuspecting users, particularly children and the elderly,” and that designers should implement protective measures against these risks.
“The ability to create physical machines capable of connecting with humans on an emotional level is maturing rapidly,” the authors write. “The robots presented here are still far from natural, but one step closer to crossing the uncanny valley.”
Enjoy Nautilus? Subscribe for free to our newsletter.
ADVERTISEMENT
Nautilus members enjoy an ad-free experience. Log in or register now.
Main image: Yuhang Hu



