There is a new kind of work out there, one that is flexible, remote and in a fast-growing industry. Unfortunately, it also tends to be poorly paid, unsteady, and shrouded in so much secrecy you would think it is international spycraft. It is annotating, a task in which humans label or tag data contained in text, images, and audio files so that AI algorithms can train on it.
As to the number of people doing this work globally, the Verge article notes that “A recent Google Research paper gave an order-of-magnitude figure of ‘millions’ with the potential to become ‘billions.'” These workers categorize the emotions being expressed by people in videos, label offensive social media content, judge how sexy different advertisements are, and even identify images of corn so that automated tractors can learn to harvest it. In many cases, the wages are as little as $1.20 an hour, though workers in the U.S. and those with needed expertise can earn significantly more.
For all the context these workers create for AI, little is provided to the annotators themselves. In one example from the article, a man in Kenya spent his time tagging different body parts (elbows, knees, etc.) in photos of crowds, without knowing what the larger purpose was. Workers are also commonly restricted from talking about what little they do know about their jobs, presumably out of fear that trade secrets could become public knowledge.
As with data storage, AI algorithms require incredibly massive amounts of training data. That may not be a problem when a company is paying its annotators bottom-rung wages. But what will happen as AI tools become more sophisticated and need more expertly annotated training data? It is one thing to know the difference between an ear of corn and a stick, quite another to distinguish between, say, a law which does or does not conform to the U.S. constitution. For this reason, my sense is that AI Training may begin to run into cost effectiveness obstacles very soon.
For now, annotating remains a new, far-flung kind of assembly line work, one in which the thoughts of millions of taskers are aggregated to distill economic value. So far, the value has been considerable. Alexandr Wang, the founder and CEO of Scale AI—a major supplier of AI training and annotation data—has become the youngest self-made billionaire in history. Looking forward, it is hard to say where the industry goes from here. The next five years should tell us a lot.