Apple is experiencing new legal challenges to how it trains the models for its artificial intelligence, in one set of two lawsuits accusing the company of using pirated book collections to develop Apple Intelligence. The complaints, filed by the neuroscientists Susana Martinez-Conde and Stephen Macknik and authors Grady Hendrix and Jennifer Roberson, accuse Apple of having ingested copyrighted works without permission via so-called shadow libraries — unauthorized collections of ebooks that have long haunted publishers and writers.

What the lawsuits say about Apple’s AI training

According to the latest suit, Apple used Books3, a much-discussed dataset drawn from pirated sources, in order to train at least one of its language models, OpenELM. They assert in their filing that among the millions of titles in Books3 are those authored by plaintiffs, along with hundreds if not thousands of other copyrighted books; therefore, training on the corpus is fair use. The prior case from Hendrix and Roberson also claims Apple’s web crawler, Applebot, scraped shadow libraries to gather training data for Apple Intelligence.

Table of Contents

What the lawsuits say about Apple’s AI training
Why the Books3 dataset matters in Apple lawsuits
Apple’s position and the wider AI industry context
Fair Use, Markets, And What Courts Will Consider
What’s next for Apple in the AI training lawsuits

Both of the complaints frame Apple’s program as part of a larger trend in AI: scraping enormous libraries of text to teach models about human language, its structure and styling. In filings in court, plaintiffs cite Apple’s competitors’ haste to ship Apple Intelligence as an incentive to acquire prepackaged data sets at scale, rather than negotiate licenses on a use-by-use basis.

Why the Books3 dataset matters in Apple lawsuits

Books3 has emerged as a flashpoint in AI litigation. The collection, which consists of approximately two hundred thousand books, was mirrored on sites such as Library Genesis and Z-Library and also included in research datasets including The Pile. And multiple lawsuits against other A.I. developers have cited the set as evidence that you can’t legally train on a corpus built from illicit copies. Tech companies argue that training is a non-expressive use like reading, and if the outputs don’t exactly lift passages, they can be shielded by fair use.

Courts have started to dissect those arguments. At early stages of litigation against several AI firms, judges have allowed some claims to proceed and dismissed others — particularly when plaintiffs could not prove verbatim regurgitation. None of these major cases has issued a definitive, blanket answer as to whether mass training on copyrighted text is fair use, leaving the legal landscape messy.

Apple’s position and the wider AI industry context

Apple has positioned Apple Intelligence as a privacy-focused approach, with an emphasis on performing processing on-device and “Private Cloud Compute,” a design that seeks to minimize the exposure of user data. But for publishers and authors, the fundamental question is not how user inputs are processed but rather where the training data originated from and whether rights holders were given adequate compensation. Apple has not publicly disclosed the complete contents of its training corpora, which is in line with other corporations in the field.

Apple does offer a tool, Applebot-Extended, that lets website owners opt out of having their data used to train Apple’s models. The lawsuits contend that those sorts of controls are irrelevant if a company obtains data sets from shadow libraries that never had permission to offer the books in the first place. That assertion cuts directly to the question of whether an opt-out is capable of curing alleged infringement that already pervades model weights.

Apple Intelligence on iPhones . Filename : appleintelligence iph ones.png

This litigation comes amid a surge in legal action attacking AI training practices. They have also filed claims on behalf of news organizations, image libraries and book authors, arguing that they had been harmed through a series of acts including market substitution and the derivative summaries that reflected their style. The Authors Guild and prominent authors have sued model developers, and media organizations have filed suits over employing news archives to create chatbot responses. “Outcomes depend on jurisdiction and how the facts develop, which is why every new case involving a major platform is carefully followed,” she said.

Fair Use, Markets, And What Courts Will Consider

Legal experts say judges are likely to revivify the principles established in cases like Authors Guild v. Google which found that scanning millions of books was a transformative use because the service showed only minimal snippets, and then provided search. Critics say that large language models have a different risk profile, since they can mimic an author’s voice and sometimes recapitulate content. Defenders argue that models grasp statistical regularities rather than possessing stored copies of works in readable form, and point to the fact that rare cases of overlap can be blocked via filtering.

Economic impact will matter. To the extent that plaintiffs can demonstrate that A.I. systems actually meaningfully displace sales of books or licensing opportunities, their case gets stronger. On the other hand, where developers show non-substitutive uses, effective deterrents to memorization, and good faith licensing when possible, fair use arguments are strengthened. The U.S. Copyright Office has pointed out in its policy work that the issue is still open and dependent on facts, and lawmakers continue to debate whether there should be new rules for training data.

What’s next for Apple in the AI training lawsuits

The plaintiffs hope to attain this in a class-action status, which would greatly elevate the stakes by consolidating claims from far more authors. Discovery might force further revelation of information about Apple’s data pipelines and processes, including whether it used Books3 or other shadow library sources and to what extent. Any ruling could have implications for how Apple, and its competitors, find data, negotiate licenses and detail their training practices.

Outside the courtroom, the pressure is already yielding changes in behavior. Some AI companies have been negotiating direct licensing deals with publishers and stock media agencies to minimize legal risk and public blowback. If these suits advance then Apple may too find itself in the same kind of calculus: either pay for cleaner data through licensing, build slower with curated sources or test fair use arguments at the feet of a judge. The outcome will determine not only the future of Apple Intelligence, but what is possible for how the next generation of AI learns to read.