Living In A Ghost World, These Data Workers Make the Internet Possible
The dispersed nature of their work, opaque AI supply chains and deficient policy have invisibilised millions of digital gig workers in India

In the early 2000s, Amazon executive Venky Harinarayan created the Amazon Mechanical Turk (AMT), the earliest model of global outsourcing of work. Its name was inspired by the 18th hoax of the Mechanical Turk, wherein a chess-playing machine was maneuvered by a human hidden inside. Amazon’s AMT too was marketed as automated but it was actually powered by armies of human labour performing unseen tasks.
The illusion persists even today. ChatGPT, Instagram, and the internet are not entirely the work of automatons. It takes labour – often invisible, always low-paid to make them work efficiently. A worker in India may be curating social media feeds for people living in Montreal, while another could be tagging images of disfigured people and diseased organs to train AI models to identify tumour locations. In Kenya, people work for less than $2 per hour (about Rs 175) to make ChatGPT less toxic. Elsewhere, people could be labelling data for a military software doing targeted killings in Gaza.
The lives of these data workers who power global tech supply chains are as much a mystery as the work itself. Described as “ghost workers” or “invisible slaves”, workers operate inside secretive supply chains and remain absent from policy conversations on gig work. Google last month fired more than 200 AI workers who made chatbots like Gemini intelligent, after they complained of poor pay and precarious work conditions.
Remote digital work has never been a part of the Indian platform work story, even though ‘crowdwork’ has existed globally for at least two decades, says Aditi Surie, an urban and platform work researcher at Indian Institute of Human Settlements. Complex AI systems – operating globally between suppliers, platforms, and human labour – are hard to trace, and harder to fit into national or international regulatory frameworks. Our understanding of gig work is also one that is visible, physical, mobile, and mostly performed by men. “This world of click work is very different from that of physical gig work,” Aditi says.
In this, the first part of a three-part series, BehanBox documents the nature and history of the “invisible” data work that makes possible the internet and AI and powers transnational economies.
The World of Digital Labour
Part-time online work that guarantees ‘easy money’ with ‘zero investment’ can be commonly found on online platforms such as YouTube, LinkedIn and Reddit. But there is no clarity on the specifics – the nature of the job, designation, working hours, payment system, or the end customer, says Priyam Vadaliya, a researcher at Aapti Institute, a digital think tank that has researched data work.
“You can do it from your home, from your mobile phone,” says an influencer in an auto-dubbed voice, promising Rs 1 lakh a month for part-time work. And what’s better, anyone can do it – a student, a fresher, a housewife. She points to third party ‘AI solutions’ platforms like Toloka for labelling data – which has also helped Russian surveillance companies train facial recognition software – and Appen for transcription. Another influencer has done a series on the ins and outs of being an MTurker in India. Scroll down the rabbit hole, and one finds a YouTube canon dedicated to the world of crowdwork.
Once crowdwork meant work done collaboratively but independently by a large number of people, but today, it represents work performed using digital labour platforms. The International Labour Organisation divides this work into three categories: physical gig work such as delivery services, work done on online web platforms, and that done remotely.
Labels such as “crowdworkers”, “microworkers”, or “online freelancers”, describe how often the work is done or what the employment relationship is. People doing repetitive data labelling are ‘microworkers’, while content moderators can both be online freelancers or microworkers depending on who employs them.
At any given moment, millions are estimated to be engaged in this liminal workspace. Here, work is broken into smaller, repetitive tasks – scrub social media, clean excel sheets or label audio/video to experience Instagram without glitch – and routed digitally to a dispersed labour force.
In India alone, roughly 1 million data annotators may be active by 2028, contributing to a data annotation sector projected to grow more than $8 billion globally. Other data work includes curating, labelling, content moderation, AI rating, transcribing, conducting surveys. If ‘AI is like a child’, as data annotators told Fifty Two the humans in the loop ensure it is trained to be precise.
An Invisible Industry
Production chains in AI data work operate like a digital factory. Big Tech companies own the product, a series of third-party vendors act as supervisors, and data workers repeat monotonous tasks to keep the assembly line running.
Workers, experts say, have little idea of who the final product goes to, or who they interact with at times. Third-party companies present themselves vaguely – as platforms doing “data solutions” work, offering opportunities to “contributors” or “freelancers” for tasks on a “variety of AI projects” that “contribute to a safer internet,” or “improve identification technologies”. Workers don’t know if the work they are doing is for Meta or TikTok because they report only to the contracted employees of some intermediary.
Applicants are hired to do ‘customer service’, but it is only after contracts are signed and training is done that they realise that this is about cleaning up somebody’s internet feed – with graphic or violent material sometimes – with no upfront disclosure, says Priyam.
Workers are also made to sign Non-Disclosure Agreements (NDAs), barring them from speaking about their work. When former BPO worker in the Philippines Renso Bajala described his job to the digital publication Rest of World, his employer accused him of breaching his NDA, he said in a recent panel discussion organised by the Tech Global Institute. Indian women working with AMT, Telus and Microworker also said they had signed NDAs and could not disclose specifics about their contract agreements.
The Indian BPO sector has also caught up with the crowdwork model of work, says Aditi. Data labelling work, which previously happened on an AMT, is now being done in traditional BPO companies like Accenture that could, in turn, be doing it for Meta. “It’s no longer remote, but it’s still data work,” Aditi notes. These new models made it hard to classify new arrangements and pin accountability on an ‘employer’. Work conditions of a remote mTurker, as they are called, are not the same as one working for an Accenture, inside an office with clear managerial structures. Their contracts may call them ‘employees’, but who is the employer?
Third party vendors have also started “labour hedging” — hiring thousands of workers for ‘ghost jobs’ to pad their numbers and look scalable financially to win contracts from BigTech companies, while workers sit without pay and work for weeks, if not months. Even when work comes, wages are still not promised and regular. A client may refuse to pay for cleaning, say, an Excel sheet with 500 rows of data if one entry is wrong, but without grievance redressal systems, the worker has no bargaining power.
“It all comes down to how you see the worker,” says Priyam of Aapti. Tedious work performed in a vacuum has the effect of alienating workers and distancing Big Tech companies from their obligations.
In the film Humans In the Loop, Geeta Guha’s character Nehma is told: “AI bacche ki tarah hain, galat sikhayenge toh galat seekh jaayega (AI is like a child, if you teach it wrong it will learn wrong).” Nehma later is reluctant to label a caterpillar a ‘pest’ because she had seen it protect the plant. “A pest for one person is not a pest for another person… Is AI really a clean slate as people think it is, or is it sort of like a descendant of our biases, weaknesses and knowledge systems?” director Aranya Sahay said in an interview.
Finding Humans In the Loop
Nighat, a tech policy analyst at Aapti, says it is a challenge to document the size or shape of this industry. An oft-quoted Niti Aayog study puts the size of the gig workforce at about 8 million. And of this, nearly half a million are online platform workers, including home-based data annotation workers. “No measurement to that scale has come in again. It still doesn’t go down to the level of gender, caste, what types of tasks they do, what are the popular sectors or departments,” says Nighat.
This opacity makes any measurement of this industry nearly impossible, she adds.
India’s Periodic Labour Force Survey uses the National Classification of Occupations which accounts for “data entry clerks” in traditional roles, overlooking modern data workers and rendering them “statistically invisible”, noted researcher Neha Arya.
Researchers instead rely on proxies like industry growth statistics — NASSCOM data from 2019 show home-based workers in India are amongst the largest in number working for these global platforms. Over 90% of data annotators are also from Tier 2 and 3 cities, such as Ranchi, Shillong, and Vishakhapatanam. India’s supply of online labour also rose from 26% to 34% from 2018 to 2020.
The demand for crowdwork went down during Covid-19 as economies struggled but the supply – the number of people seeking to do crowd work – increased. This trend was pronounced particularly among women and young workers, said ILO economist Niall O’Higgins in an interview. The lockdown, and loss of traditional jobs, both increased an appetite for remote work. And livelihoods like nursing, childcare, elderly care, education, and sex work – which primarily engaged women and gender minorities – experienced disruptions, worker shortages or transitioned to online platforms. The platformisation of these feminised sectors normalised the idea of earning through digital interfaces, drawing many women into forms of precarious “clickwork” and data labour.
An ILO consultation from 2023 also showed the median age of people is about 26, comprising mostly women working over smartphones and laptops, and for about 64% of them, microtasking was the main source of income. Many are graduates — half with STEM degrees — but turn to data work amid high unemployment, informalisation of labour, and the search for flexible options to balance caregiving. “Some may be new mothers who want part-time work to make additional income or who would want some mobility,” Aditi explains. Other women do this work for usual reasons: the lack of opportunities, commuting problems, safety or cultural barriers.
The ILO study added that workers initially believed the promise of co-creating with AI was attractive, but in reality, they found the jobs led nowhere, and demanded more of them physically and cognitively. There is a mismatch between education qualifications and the work women end up doing in particular, Aditi says. And as with traditional sources of employment like sewing, embroidery and so forth, women didn’t think of themselves as ‘workers’.
From Dotcom Boom To AI Labour
The story of data work begins before platforms, at a time of change. It was the dot-com boom of the 1990s. The world wide web came about, and became free and open in 1993; about 50,000 companies (funded publicly or by venture capitalists) rushed to commercialise the internet. Soon came giants like Yahoo, Amazon and eBay.
These newly established internet regimes are headquartered in the US and the Global North, looking to outsource work to countries with low-labour costs like India, Pakistan, and the Philippines. These destinations also have colonial histories with huge populations of English speakers and skilled workers to shoulder ‘customer service’. At the same time, liberalisation opened India’s economy to foreign capital. States offered tax exemptions and infrastructure offerings such as tech industry-related economic zones. Here, in crisis-ridden economies, emerged the Business Processing Outsourcing (BPO) sector, coordinating global diffusion of work at an equally grand scale.
By the early 2000s, India’s crises of mass unemployment and recession had given way for the information technology, telecommunications, and service (ItES) industries to take off. This also coincided with a rise in casualisation of labour – contract work increased from 12% of total manufacturing work in 1985 to 24% in 2008.
Between 1998 and 2005, a telecom and tech industry boom gripped India. Internet population – predicted to increase fivefold from 5 million users by 2005 – had reached at least 27 million users by some estimates (and 38.5 million by others). More users meant more data, which was plentiful and cheap for industries to mine and monetise. Data equaled profit, and companies began to design platforms to extract data, analyse consumer preferences, and offer targeted products in the early days of machine learning and AI systems.
AMT entered this landscape in 2005. From its very start, crowdwork platforms were aligned to perform tasks like transcribing or tagging images that were too complex for computers, but too voluminous for a traditional workforce. It was the problem that Amazon’s Venky Harinarayan wanted to solve at the turn of millennium, when the Amazon marketplace was expanding beyond books and basic products. Part of this process required painstaking cataloguing and removing thousands of duplicate products on its website. As we said, Harinarayan proposed the idea of solving this data processing problem through a “hybrid machine/human computing” arrangement that would farm out cataloguing tasks to a network of human workers. Jeff Bezos, when launching mTurk, called these workers “artificial artificial intelligence”.
“What AMT did was take global outsourcing work and put it in on a platform, so you’re able to do it from the comfort of your own home instead of having to go to somebody else’s office to do it,” says Nighat.
AMT then diversified into market research, data entry, data cleaning, and other low-wage tasks to support artificial intelligence. A wave of web-based platforms (like Telus, Clickworker, Microworkers) followed, trying to make smarter machines over time. Platforms enabled a new model of traditional ‘outsourcing’, giving way to crowdsourcing, where work was fragmented into micro-tasks distributed worldwide.
The result, researchers argue, is the rise of “digital sweatshops” and “global digital factories” without oversight and beyond geographies.
Vague Policies
In recent years, unions from Latin America and East Africa have ushered movements to bring digital labour industries to account. But Indian gig work policies are still occupied with visible, ‘on-demand’ forms of labour, where issues of poverty, vulnerability, physical harm, and discrimination are easier to track. The Code on Social Security, which defines work beyond traditional employer-employee relationships, differentiates between ‘gig work’ and ‘platform work’. But Aditi points out a problem.
“Within the current framework, there is potential to interpret a home-based worker to be an online worker… But in official communication from the Ministry of Labour and Employment, the current imagination of people who need social protection is physical gig workers,” she notes, partly because others lack collective organisation.
Several informal or feminised sectors have historically been hidden in plain sight. It took the Covid pandemic, for instance, for ASHA workers’ contribution to public health to become visible. Much like microtasks in data labelling, women engaged in garment industries also did work that was fragmented, invisible, remunerated on a per-piece basis, and operated in transnational economies. Their visibility grew when NGOs and unions documented exploitative supply chains.
More documentation could similarly help reveal the hidden reins of this industry in India, says Nighat. Seeking answers about who these workers are, where they live, and what they do would help tailor policies, from subsidised internet and maternity benefits to mental health support for content moderators exposed to violence.
In East Africa, worker collectivisation has created awareness at a global level, and people are slowly seeing the links of AI supply chains spread across countries like India too, says Priyam. “It will get to a point where you can’t ignore this work, and these workers, anymore.
Further reading:
1. This IT For Change study documents stories of women working with AMT during the pandemic.
2. A study linking the erasure of human labour with techno-utopian narratives of AI hype.
3. A participatory project where data workers from five continents report on their workplaces.
4. How Pulitzer Centre investigated the scale of human labour.
5. A report from Kenya about the experiences and demands of digital data workers.
Want to share experiences or questions about the world of data work? Write to us at contact@behanbox.com.
We believe everyone deserves equal access to accurate news. Support from our readers enables us to keep our journalism open and free for everyone, all over the world.



            