The technologies of generative artificial intelligence and large language models, and their associated algorithms, are increasingly shaping our everyday lives. Avery Reyna argues that the constant need to develop groundbreaking technologies will turn these seemingly abstract algorithmic harms into real-life consequences felt by everyone across the globe
Last November, OpenAI released ChatGPT, a question-answering chatbot powered by a large language model. This has prompted intense speculation about the capabilities of these models, and how they might affect our lives.
We've seen many advances like these over the past few years — technologies that push the boundaries of artificial intelligence (AI) architecture and increase the magnitude of already petabyte-sized testing datasets.
However, discussions relating to these risks tend to get drowned out by CEOs, entrepreneurs, and research scientists. These influential groups are concerned only with reaching higher benchmarks and emulating 'consciousness' similar to that found in humans.
With AI technology, entrepreneurs and research scientists are striving to emulate human consciousness
Thus, to better understand how this AI race will affect us, we must step back and ask some fundamental questions. Who is interacting with this data? What are the international and domestic implications of these models getting better with each iteration? How big is too big?
For large languate models (LLMs) tools to work, no harmful content can be fed into their machine-learning algorithms. Thus, they must first be fed huge sets of pruned, fine-tuned training data. To do this, millions of underpaid workers around the world perform punishingly repetitive tasks under harsh labour conditions. The job involves labelling dangerous and outright damaging content from the depths of the internet. This human labour — the driving force behind LLMs — is also known as ghost work. It is an exploitative, unethical practice — yet it does not figure in the discourse surrounding the deployment of AI systems.
Millions of underpaid workers are performing punishingly repetitive tasks under harsh conditions, labelling dangerous content from the internet's depths
And these harms are not abstract. OpenAI partnered with Sama, a San Francisco-based data-labelling company, to recruit data labellers in Kenya. Sama pays these workers between $1.32 and $2 an hour to sift through countless pieces of content depicting heinous acts of suicide, sexual violence, and executions. Sama has experienced a backlash against its practices — and not for the first time. Only last year, Time investigated Sama’s collaboration with Meta. It revealed content moderators were severely traumatised by the extreme content to which they had been exposed while labelling data. Despite this, Sama quashed moderators' efforts to secure better working conditions.
Tech evangelists and CEOs are conspiring to maintain the illusion. They assert that these AI-powered systems are magically self-sustaining; that they produce seemingly perfect text without the need for human intervention.
The reality is quite the opposite. These same people will beg the US government to bail out the tech sector from an unforeseen economic downturn. But the ghost work behind the tools they develop remains under wraps to avoid labour exploitation scrutiny. This exploitation should be out in the open; the main focus of AI policy discussion.
Clearly demonstrated harms have stemmed from data labelling, content moderation, and LLMs more broadly. Yet the continued effort to invest in these systems stems from the persistent idea that AI and computational means of intervention are beneficial for humanity. Particularly appealing is the idea that AI will inform policy solutions in the real world.
In a domestic context, US law enforcement has taken advantage of the epidemic of facial recognition technology. In recent years, place-based predictive policing has enabled law enforcement to 'improve the efficiency of crime-reduction efforts'. Justification for using these types of technologies ignores the impact race has on policing practices in the status quo. But many American states have become aware of it. Cities including San Francisco and Boston have now banned facial recognition software because its biased misidentification disproportionately affects communities of colour.
Law enforcement in some US cities has banned facial recognition software because its biased misidentification disproportionately affects communities of colour
Across the world, similar AI solutions have suffered a similar fate. Demark and Australia came under fire for using biased machine learning algorithms to 'score' vulnerable communities and assess their need for government benefits. Such technology turns governments into surveillance behemoths. As with facial detection software in the US, the use of machine-learning algorithms was paused in 2021 because the data risked producing discriminatory results.
Data is not objective. The information embedded within AI and used to train machine-learning algorithms comes from the internet. This, inevitably, encodes racist stereotypes, reinforces power imbalances, and deepens inequality. If you use these datasets for your own purposes or, in this case, to inform policy targeting a specific community, such biases are only magnified. These harms don't exist merely in the abstract; they are part of our everyday lives. And if we want to improve technology for everyone on earth, we must learn to connect these two realities.
The difficulty of understanding datasets, and the implications certain information has on vulnerable communities, scales with the size of the models engineers create for LLMs and AI technologies. As we move towards a future further entrenched in the capabilities of new technologies, we must maintain our current scepticism. Without it, the age of AI might become characterised by colonialism, exploitation, and wilful ignorance of the social technologists who are trying to guide us in the right direction.
Humans are controlling and building new tools that could shape international relations and domestic budgets. Such tools have the potential to create a better future for us all. But we must not let these technologies rule and divide us. They can, and will, be used for good.