Voice and Gender in Video Games: What can we do with spectrograms

milenadroumeva's picture

I'd like to talk a bit about one of my latest research projects, concerning sound, voice and gender in video games. I've been involved in sound research for over 13 years, coming at it mostly from acoustic ecology and media studies, where we look at a soundscape as a system of elements. The political, the socio-cultural critical aspect has traditionally been missing in that type of analysis, so looking at sound in video games in terms of gender united two of my passions nicely. As a system of elements, game soundscapes have fascinating roots in electronic music; later, when the technological constraints of chip space were solved, sound effects developed along cinematic models of montage and synchresis, with the functional elements of early computerized sound notification systems: that is, incorporating confirmatory, action-based, and affect-driven sound layers, combining foley sounds, a variety of signals, and music. Avatar voices specifically, have had an interesting progression from text-to-speech adaptations, synthesized vocalizations (battle cries), and more recently - professional voice acting and overdubbing, with cinematic quality cutscenes. With the rising fidelity and “resolution” of game narrative spaces, voice has inherited many of the classic gendered tropes of broadcasting and cinema: this is where my work began. In an attempt to compare historical developments of several female character typologies in games, my team and I looked at classic fighting RPGs (Role-playing games) and Adventure RPGs, and we counted the ratio of female-to-male battle cries specifically in combat sequences.

What that content analysis revealed was an increase in female vocalizations over time, but we noticed a marked quality shift as well - a move from synthetic grunts and shouts to more hyper-real, dramatized and feminized vocalizations. This is where a spectrographic analysis, as a subset of digital audio tools, offered a rich perspective from which to compare and qualify female voices in games. Using a combination of SpectraFoo (for real-time analysis), Sonic Visualizer, and Adobe Audition I did a kind of ‘close reading’ of several game instances. You can see in the annotated screenshots, the female vocalizations are longer and literally take up more sonic space even if they are equal in frequency to the male battle cries. More importantly, they are much more intoned, inflected, with a dynamic envelope and pitch profile, and often - particularly at the end of fighting sequences - feature added reverberation with a very long tail. In combination with reflective accounts of player experience, digital audio tools and visualizations allow us to access gendered sonic tropes in novel ways. Looking at the spectral gestures and textures of character voices in the game soundscape begs questions about how we think male and female (as well as agender, atypical or non-human characters) ought to sound like, and why. In that, such tools allow us to pinpoint persistent stereotypes encoded in the very design of games and by extension - other popular media texts. I’m excited to see where research and analysis will be able to go with the help of emergent multimodal toolsets and novel digital methodologies.

Image credits from top to bottom: Street Fighter II (Honda vs Chun Li); Soul Calibur V (featuring Ivy); Tomb Raider 2013 (Lara Croft, opening scene);