In the many conversations around “the metaverse” sparked by the Facebook corporate name transition to Meta, much has focused on the visual elements. What’s hardly mentioned is audio. Yet voice really matters when making a virtual environment come to life.
Sometimes, it’s everything.
Just ask Spike Jonze. The movie director discarded the original voice actor in the title role of his 2013 film “Her” and substituted Scarlett Johansson’s sultry timbres. Although Samantha, a computer operating system, never appeared in the flesh, Jonze felt the original actress hadn’t nailed the emotion required to create a three-dimensional persona.
Voice was critical to creating a fleshed-out character that could absorb the viewer into the story’s premise and make it fully believable.
As The Washington Post noted, many keystones of the Meta vision of the metaverse already exist in video gaming — only in disconnected gaming worlds. And in the gaming universe, voice is playing an increasingly important role. Meta promises a unified, interoperable experience, but without a rich panoply of highly textured, lifelike digital voices, the metaverse will be incomplete rather than inclusive and immersive.
The McGurk Effect research of the mid-1970s observed the cognitive dissonance resulting from mismatched audio and visual perceptions; voices that don’t fully mesh with an avatar can rip the participant from the virtual environment.
Expressing the real you
Humans are social beings, and the metaverse as currently promoted is a social environment where participants create unique personas in both home and workplace settings. Avatars will allow players to express themselves the way they want to be seen — as human, alien, animal, vegetable, cartoon or myriad other options. Players can try on new “looks” temporarily, the way they try new outfits. Gender and species are fluid.
Changing identity is hamstrung, however, if people are not able to change how they sound along with their visual presence. Having your voice match the persona presented to other people is a core element of a personalized player identity. It is a situation many people are already accustomed to from video games.
If you encounter a gritty, bearded, hulking knight in a game you are playing, you would expect that character to have a deep, gruff voice, accompanied by the clank of armor. Game companies make sure to deliver this, with non-player characters (NPCs) being carefully crafted by voice actors and audio specialists to provide an immersive experience.
Yet in online gaming environments or in the future metaverse, where the knight is the representation of an actual person, you will have a vastly different experience. You may be startled to hear a high-pitched teenager with bad microphone quality instead of the anticipated gravelly, mature voice. The drastic incongruence between sound and vision shreds the immersive quality of the experience. Metaverse avatars can only be fully immersive if they allow people to create full digital experiences.
In addition to enabling immersion, sonic identity technology can also allow players to slip into “true” pseudonymity. They can fully become the person (or being) they want others to see — which for many people is powerful protection from a sometimes hostile online environment. It can disguise a geographical accent so the participant can more smoothly integrate a player community (a capability an offshore customer support call center might benefit from). For people with vocal tics, it can cloak a physical impairment they’d rather not reveal.
Voice changing technology can also help to mitigate online discrimination and harassment. A research study published in the International Journal of Mental Health and Addiction in 2019 notes that female gamers frequently avoid verbal communication with other players to reduce unpleasant interactions. Voice changing technology can allow them to participate in fully pseudonymous conversations, free of a specified gender, in which they might feel more comfortable in expressing themselves.
Regardless of the “why,” researchers in the scholarly journal Human-Computer Interaction concluded in 2014 that “voice radically transforms the experience of online gaming, making virtual spaces more intensely social.”
From my own internal company data, it’s clear that players who communicate with voice alter egos feel more engrossed in the game, engage it for longer periods of time and spend more money within the game as a result.
What’s missing in the metaverse
A truly complete immersive experience requires a combination of 3D visuals and real-time audio to enable people to express themselves in the way they want to be heard. Participants want a sonic representation of themselves that is just as original and unique as their visual avatar — and they want the tools to customize their voice as meticulously as their appearance. There must be a harmonious marriage of both augmented audio and 3D video to keep the player immersed and engaged.
Real-time audio defines how people can bring the ultimate individuality to their content, making audio the great equalizer of the metaverse. Unfortunately, the current voice experience is challenged to offer the type of immersive qualities that will live up to the promise of an all-encompassing metaverse.
Real-time audio personas are still restrictive at best, despite experimentation by dogged early adopters. The tools to shape a person’s voice to match their digital self are limited and sound quality does not yet match visual quality.
Yet recent advancements in available audio technology are making it far easier for players to create a unique sonic identity. New solutions available to platform and game developers enable writers, producers and audio engineers to incorporate voice modification technology within their games to produce natural-sounding and fantasy voices on demand, in real time.
It offers the potential to generate new avenues for monetization by delivering an inclusive and immersive auditory experience that lures players and keeps them fully focused and engaged in the experience rather than dropping away.
Companies are investing in powerful tools that enable people to shape the visual representation of themselves in a digital space. They must not overlook the customized sonic identity for a matching social audio experience that makes the digital representation seamless.
The metaverse won’t be complete without it.