FindArticles FindArticles
  • News
  • Technology
  • Business
  • Entertainment
  • Science & Health
  • Knowledge Base
FindArticlesFindArticles
Font ResizerAa
Search
  • News
  • Technology
  • Business
  • Entertainment
  • Science & Health
  • Knowledge Base
Follow US
  • Contact Us
  • About Us
  • Write For Us
  • Privacy Policy
  • Terms of Service
FindArticles © 2025. All Rights Reserved.
FindArticles > News > Technology

Adobe Hit With Class Action Over AI Training Data

Gregory Zuckerman
Last updated: December 18, 2025 2:03 am
By Gregory Zuckerman
Technology
7 Min Read
SHARE

Adobe is under fire for a proposed class-action suit that claims the company used pirated books to train an itty-bitty language model, escalating tensions around who owns the data behind generative AI. The complaint alleges that the SlimLM model developed by Adobe was trained on a dataset that included copyrighted books copied without authorization, according to a filing reported earlier by Reuters, which noted that the person who brought the new case as part of an existing lawsuit against Governor Ron DeSantis argued it substantiated her claims.

Books3 and SlimPajama the Target of Accusations

The lawsuit claimed that SlimLM was pretrained on SlimPajama-627B, an open-source corpus made available by Cerebras and labeled as a deduplicated multi-corpora dataset with about 627 billion tokens. The plaintiffs contend SlimPajama is a derivative of the RedPajama program, which included Books3, a collection of books that has been under steady criticism for including about 191,000 controversial works used by researchers and companies to train large models.

Table of Contents
  • Books3 and SlimPajama the Target of Accusations
  • What Adobe Says and What Is in Dispute Over SlimLM
  • The Legal Context and Precedents in AI Training Data
  • Implications for AI Data Governance and Compliance
  • What to Watch Next in the Adobe SlimLM Lawsuit
A robotic hand with red lighting reaching towards the red Adobe logo on a light background, resized to a 16:9 aspect ratio.

Following the data lineage from SlimLM down to SlimPajama and then over to RedPajama and Books3, the complaint claims that this became a chain of copying that ultimately included Lyon’s works in addition to those of other authors. The heart of the claim is simple: if SlimPajama includes Books3 and SlimLM was trained on SlimPajama, Adobe has derived benefit from unauthorized use of copyrighted books in pretraining.

Books3, which has been subject to criticism by groups representing writers, has raised a wider point of controversy over how open datasets were determined and republished. To supporters, such corpora are essential for research and competition; to authors, the wholesale ingestion of full books without consent is a violation of their rights that threatens their livelihoods.

What Adobe Says and What Is in Dispute Over SlimLM

On the public-facing site, Adobe described SlimLM as a family of small models “optimized for document assistance tasks on mobile devices.” The company has also sold its broader AI strategy, particularly Firefly, by playing up training on licensed, public-domain, and Adobe Stock content to ensure outputs are “commercially safe.” The suit, though, zeroes in on SlimLM’s pretraining pipeline rather than image production, which makes the provenance of text datasets the main conflict.

The core legal questions will sound rather familiar: whether the mass consumption of copyrighted works was fair use for model training; if any copying was purposeful; and how liability should be allocated across a data supply chain that includes scrapers, curators, and model developers. As many open datasets are composites of older corpora, the importance of provenance and auditability will be critical in discovery.

The Legal Context and Precedents in AI Training Data

And tech companies are running into an ever-louder drumbeat of challenges related to training data. RedPajama has been cited in recent lawsuits against Apple and Salesforce. In a significant development closely followed by the industry, as much for what it doesn’t resolve as for what it does, Anthropic settled with authors it’s accused of having used unauthorized works to train its Claude model to the tune of $1.5 billion; negotiated outcomes can be enormous when rights holders coordinate at scale.

A professionally enhanced image of Adobe Creative Cloud application icons, resized to a 16:9 aspect ratio, presented on a clean, dark background.

There are some relevant guideposts for courts, but no firm rulebook on AI. The Google Books ruling approved of limited copying and display (as part of search or research) as fair use, but here training data and output from a generative model are different. If a class is certified, potential exposure could be substantial — the statutory damages in the United States can climb to $150,000 for each work that was infringed willfully, although courts frequently decrease those awards and many cases settle without going to trial.

Implications for AI Data Governance and Compliance

Outside the courtroom, the case underscores an awkward tension in AI development: that open datasets, so vital to speeding up research and competition, can carry legal risk if they incorporate unauthorized copies. Strong model documentation, defensible licenses, and end-to-end data audits are rapidly becoming table stakes for enterprise adoption of AI.

Adobe has been a leading supporter of content provenance with several other tech companies as part of the Coalition for Content Provenance and Authenticity, promoting “Content Credentials” that would track media’s creation and modifications. If the claims about SlimLM’s pretraining prove out, critics would see a disconnect between the company’s governance messaging and its data practices — making the case that provenance standards should specify backward accountability to training corpora, not just outputs.

What to Watch Next in the Adobe SlimLM Lawsuit

Count on Adobe fighting the allegations, probably by claiming fair use and challenging the data lineage. Early motions will challenge whether the complaint adequately links SlimLM’s training to particular copyrighted works and if class certification is warranted based on divergent author claims.

Key turning points would be any court-ordered disclosures from SlimLM about the mix it used to train, which could establish who is obliged to provide details on AI in such cases. Also be on the lookout for regulatory signs: The Copyright Office in the United States is still investigating AI and authorship, while competition and consumer protection agencies have said that data provenance is a trust and safety concern for AI products.

And either way the case goes, it will contribute to a jurisprudence that is already emerging about how models can be trained, what kind of documentation buyers will require, and how rights holders are paid. The signal to developers is becoming clearer and clearer: know your data, license what you can, and be prepared to prove it.

Gregory Zuckerman
ByGregory Zuckerman
Gregory Zuckerman is a veteran investigative journalist and financial writer with decades of experience covering global markets, investment strategies, and the business personalities shaping them. His writing blends deep reporting with narrative storytelling to uncover the hidden forces behind financial trends and innovations. Over the years, Gregory’s work has earned industry recognition for bringing clarity to complex financial topics, and he continues to focus on long-form journalism that explores hedge funds, private equity, and high-stakes investing.
Latest News
Tesla Patent Suggests Starlink Antennas on Cars
MacBook Air M4 Hits $749 in Standout Deal
Ecovacs Deebot T80 Is $500 Off at Amazon
Google Releases Second December Pixel Update
Big Discounts as Galaxy Z Fold 7 and Pixel 10 Pro Fold Prices Drop
Kobo Clara Colour Hits Record Low Price at Amazon
Samsung TriFold Endures 25K Folds In Live Test
Jared Isaacman Is Confirmed As NASA Administrator
EDU Unlimited all-access plan on sale for $20
Apple Prepares iMac Pro That Will Have M5 Max
YouTube Withdraws Data From Billboard Charts
Oscars Leave Disney, Switch to Free YouTube Stream
FindArticles
  • Contact Us
  • About Us
  • Write For Us
  • Privacy Policy
  • Terms of Service
  • Corrections Policy
  • Diversity & Inclusion Statement
  • Diversity in Our Team
  • Editorial Guidelines
  • Feedback & Editorial Contact Policy
FindArticles © 2025. All Rights Reserved.