Tech Titans Face Backlash Over Controversial Use of Books to Train AI Systems


The utilization of nearly 200,000 books to educate advanced artificial intelligence systems by leading tech companies has become a contentious issue. Covertly included in this endeavor is the system known as Books3, a data set sourced from illicitly procured e-books spanning an assortment of genres, from racy fiction to prose poetry. This array of books acts as a novel tutor to the burgeoning AI systems, helping them learn effective communication.

While a section of AI training text can be obtained from internet articles, superior AI necessitates superior text to learn linguistic abilities. This is where books prove invaluable. However, Books3 has been thrust into the limelight, with multiple lawsuits filed against Meta and other firms utilising the system to coach their AI.

Follow us on Google News! ✔️

A recent database published by The Atlantic, extracted from Books3, permits authors to discern if their distinct literary works are being employed to educate these AI systems. This revelation has stirred a pot of discontent among the literary figures.

On learning that her book was being exploited, author Mary H. K. Choi expressed her consternation on social media. Her debut novel, “Emergency Contact,” was, in her words, “deeply personal.” It became a New York Times bestseller, challenging initial criticisms of being “too quiet and niche.”

Author Min Jin Lee, who penned “Pachinko” and “Free Food for Millionaires,” echoed Choi’s sentiments, uncompromisingly labeling the misuse of her books as outright “theft.” Meanwhile, Nora Roberts, whose works appeared 206 times in the Books3 database—the most by any living author—defined this exploitation of creative intellectual property as “all kinds of wrong.”

This violation did not come as a surprise to Nik Sharma, author of the cookbook “Season”, which was included in the database. According to Sharma, it is paramount that authors are conferred the courtesy of prior consultation about usage and compensation. He compares the situation to education in the US, which is fundamentally not free. He eloquently described the current state of affairs as the “Wild West”.

While many tech firms, including Meta and Bloomberg, declined to comment or made vague commitments to not include Books3 in the future, James Chappel, an academic author, surprisingly remained indifferent to any perceived exploitation of his craft, emphasizing that he intended his book to educate.

Such misuse of creative intellectual property has been a cause for concern among writers, with the Writers Guild of America demanding limits on the usage of AI in the cinematic space. This worry extends to visual artists, who faced similar circumstances of their works being used without consent to train AI systems.

This disconcerting state of affairs coincides with the announcement of plans to issue an executive order on AI by US President Joe Biden. It points to an ethical divergence on AI innovation and the intense personal and intimate lines it often crosses.

The midnight hour witnessing these numerous face-offs with AI and literature can be disheartening for writers like Choi, who confessed a feeling of inevitability despite the personal importance of her work. Others like Nora Roberts bolster a call to arms, urging authors to unite and stand in defiance of the misuse of their creative talents and insisting upon support from readers and viewers alike.