AI giving voice to the voiceless, but only to those who can access it

Voice banking is a new technology that safeguards the distinctive vocal signatures of people who have lost the power of speech. Though this is, of course, a positive development, Soumi Banerjee and Dipjyoti Paul show how such advancements might end up widening existing cleavages between the affluent and economically disadvantaged, prioritising certain voices over others

Speech and speechlessness

Silence, far from being neutral, carries profound meaning. If one is willing to listen, silence often reveals volumes. The ‘silenced’ are a socially constructed class, and scholars affirm this construction every time we claim to give voice to the ‘voiceless’. Postcolonial social science research, and subaltern studies in particular, aims to shift focus from traditional historical narratives dominated by the powerful elite to the experiences and narratives of those at the margins of society.

Subaltern studies give voice to oppressed and historically marginalised communities or groups. The people in these groups have been systematically silenced and excluded from mainstream discourse. Colonists portrayed subaltern people as primitive and underdeveloped, legitimising their processes of dispossession. Subaltern studies challenge this portrayal, emphasising lived experiences to reveal and amplify marginalised voices. Thus, it expresses the unique histories, struggles, and resistance of colonised people.

Scholars in this field endow the subaltern with a new voice by pushing against historical silencing. Silence, in this context, becomes a subaltern strategy of resistance. It draws attention to subaltern people's silenced history, which is deeply embedded in power struggles.

AI voice banking invites scholars and technological innovators to capture, preserve, and uplift diverse voices

Voice banking is a groundbreaking technology that captures and preserves an individual's vocal nuances. It offers a secure, versatile solution for future applications and is especially beneficial for people facing voice loss. Amid ongoing discourse on speech versus speechlessness, the transformative potential of generative AI in voice banking aligns with the fundamental goals of subaltern studies. Both endeavours aim to be a transformative force that uplifts and restores agency to those who are voiceless. This intersection offers potential for collaboration between academic and technological spheres to capture, preserve, and encourage diverse voices.

Voice banking: transforming communication

Voice banking allows individuals to safeguard their distinctive vocal signatures for future use. This development holds significant potential, especially for those with conditions such as amyotrophic lateral sclerosis (ALS), Parkinson’s, and other degenerative diseases that impede verbal communication.

When people lose the power of speech, they are not just stripped of a functional means of communication. Simultaneously, they forfeit a conduit through which individual and collective identity finds expression – their distinctive voice. Subsequently, many resort to an 'augmentative and alternative communication' (AAC) device. While some AAC devices offer synthesised voices, these are frequently impersonal – and many people are reluctant to accept them.

When a person loses speech, they also surrender a key part of their personal identity: their distinctive voice

Voice banking strives to counteract this problem. By recording a person’s original voice before it is lost, voice banking generates an approximation of their speech timbre and patterns. This contributes significantly to preserving a person's identity in the face of speech loss. Effective communication fosters a sense of independence and autonomy.

Before voice banking, creating a personalised vocal signature required responding meticulously to randomised textual prompts, yielding 5–15 minutes of audio samples. Machine learning models harnessed these recordings to build a sophisticated speech synthesiser algorithm. Happily, recent advancements have created algorithms capable of achieving the same outcome with only few seconds of audio recording.

The development offers hope and empowerment for people who would otherwise face substantial barriers to communication. This intersection of technology and empathy is transforming lives, and is a testament to technology's positive influence in fostering inclusivity and preserving individual identity.

A complex synergy?

However, in acknowledging the AI endeavour to ‘give voice,’ we must also ask 'whose voice is being heard?'. This transition from a conventional, human-centric approach to an AI-led initiative means humankind must re-evaluate relationships between human and non-human entities.

The democratisation of ‘voice’ through AI, though a pioneering endeavour, raises significant concerns about accessibility and equitable participation. Subaltern studies and machine learning technology share a common objective to ‘give voice to the voiceless’. Their respective capacities to amplify unheard voices, however, differ.

Traditionally, we ‘give voice’ in a nuanced and empathetic way, strengthening the fabric of society by bridging the gap between the powerful and the powerless. However, AI-led voice banking, while intended to ‘give voice’ to those who can afford it, may inadvertently exacerbate disparities, widening the existing cleavage between the affluent and the economically disadvantaged.

An ethical framework for AI-driven voice banking should guard against widening the existing cleavage between affluent and poor

Clearly, there is a pressing need to establish an ethical framework for AI-driven voice banking. This framework should guarantee responsible use of AI and should tackle potential security threats, such as the escalating presence of deepfakes.

The manifestation of disparities and the way forward

Subaltern studies argue that power structures systematically produce a ‘voiceless’ class. But this perpetuates the disenfranchisement of individuals consistently deemed ‘less’ than their socio-economically privileged counterparts. AI-led voice banking is a tangible manifestation of this dynamic, representing the material existence of these power-induced disparities.

Voice banking exemplifies the intersection between technology and societal power dynamics. While it provides a means for some to regain control over their voice, the technology is not accessible to all. This raises two key questions: ‘Who is more worthy of a voice?’ and ‘whose voice should remain unheard?’.

As we ponder this conundrum, it is critical we examine the impact of AI-driven voices on inclusivity and authenticity.

This article presents the views of the author(s) and not necessarily those of the ECPR or the Editors of The Loop.

Contributing Authors

photograph of Soumi Banerjee Soumi Banerjee PhD Candidate, School of Social Work, Lund University More by this author
photograph of Dipjyoti Paul Dipjyoti Paul PhD Candidate, Department of Computer Science, University of Crete More by this author

Share Article

Republish Article

We believe in the free flow of information Republish our articles for free, online or in print, under a Creative Commons license.

Creative Commons License

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

The Loop

Cutting-edge analysis showcasing the work of the political science discipline at its best.
Read more
THE EUROPEAN CONSORTIUM FOR POLITICAL RESEARCH
Advancing Political Science
© 2024 European Consortium for Political Research. The ECPR is a charitable incorporated organisation (CIO) number 1167403 ECPR, Harbour House, 6-8 Hythe Quay, Colchester, CO2 8JF, United Kingdom.
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram