Encyclopaedia Britannica and its subsidiary Merriam-Webster have sued OpenAI, alleging that the company’s artificial intelligence models were built and operate on wholesale copying of protected reference works. The complaint accuses OpenAI of large-scale scraping of Britannica’s corpus — including nearly 100,000 online encyclopedia entries — and dictionary content, and of reproducing that material in outputs without authorization.
The case strikes at a central fault line in the AI industry: whether training and retrieval systems can lawfully ingest and echo definitive reference content without a license, and what happens when those echoes are mistaken for the real thing. Britannica argues that both its intellectual property and its reputation have been put at risk.
Copyright Claims Target Training And RAG
The lawsuit says OpenAI scraped Britannica and Merriam-Webster to train large language models and to power retrieval augmented generation, a workflow that lets ChatGPT pull from up-to-date sources when answering questions. According to the filing, this results in full or partial verbatim passages from paid and copyrighted entries appearing in model outputs, displacing visits to the original sources.
Britannica frames the practice as a twofold infringement: first at ingestion, where proprietary articles are copied into training datasets and retrieval indices without permission; and second at output, where those materials are reproduced in responses. The publisher argues that this is not transformative use but a commercial substitute that siphons audience attention from carefully edited reference pages.
Lanham Act Allegations And Reputational Harm
Beyond copyright, Britannica invokes the Lanham Act, claiming that ChatGPT sometimes fabricates facts and attributes them to Britannica or Merriam-Webster, creating consumer confusion about the origin and accuracy of information. In the company’s view, hallucinated citations tarnish the brands’ hard-won reputations for editorial rigor.
The complaint also contends that AI summaries act as a substitute for visiting publisher sites, weakening the economics that fund expert editors and lexicographers. Similar concerns have been raised by industry groups that track referral traffic: when authoritative material is surfaced inside chat interfaces, fewer readers click through to the source, a dynamic publishers say erodes subscriptions and advertising revenue.
Part Of A Wider Legal Front Against AI Firms
Britannica’s action adds to a growing slate of lawsuits from media and information companies challenging AI training and outputs. The New York Times has sued OpenAI over alleged misuse of news archives; Ziff Davis and a coalition of North American newspapers — including the Chicago Tribune, the Denver Post, the Sun Sentinel, and the Toronto Star — have filed similar claims, as has the Canadian Broadcasting Corporation. Britannica also has a pending case against Perplexity over related issues.
At the same time, OpenAI has pursued licensing deals with several publishers, signaling an appetite for negotiated access to content. Those agreements, however, do not cover Britannica or Merriam-Webster, and the new complaint argues that voluntary licensing is not a substitute for redressing past unauthorized use.
Sparse Precedent And The Anthropic Ruling
Case law on whether training on copyrighted text is fair use remains unsettled. In a separate case involving Anthropic, federal judge William Alsup suggested that using text as training data could be considered transformative. Yet he also found that the wholesale, unauthorized acquisition of millions of books crossed legal lines, a distinction that contributed to a $1.5 billion class settlement benefiting affected writers.
The lesson for AI companies is that how data is obtained and how closely outputs track their sources both matter. Courts are beginning to parse the differences between training on licensed corpora, scraping without consent, and retrieval systems that can surface near-verbatim passages from proprietary databases.
What Comes Next in Britannica’s Lawsuit Against OpenAI
Britannica is expected to seek damages and injunctions that could bar OpenAI from training on or retrieving from its content without a license, and to require technical safeguards that curb verbatim reproduction and misattributed citations. Remedies could include content filtering, source attribution controls, or even model fine-tuning and unlearning routines tailored to reference works.
For enterprises building on generative AI, the outcome will shape risk assessments: if reference content is off-limits without explicit permissions, procurement and compliance costs rise, but the reliability of citations could improve. For consumers, the case spotlights a central trade-off of AI convenience versus the sustainability of the institutions that produce vetted knowledge.