Encyclopaedia Britannica, the parent of Merriam-Webster, has filed a sweeping lawsuit against OpenAI, accusing the company of large-scale copyright infringement and false attribution tied to its AI systems. The complaint alleges OpenAI trained models on Britannica and Merriam-Webster content without permission and now generates outputs that reproduce or closely mirror that material, undercutting the publisher’s business.
The suit also targets how OpenAI’s retrieval-augmented generation (RAG) workflows surface and summarize Britannica’s articles, and it adds a trademark claim under the Lanham Act for hallucinated citations that the publisher says mislead users by falsely invoking Britannica’s authority. The filing argues that the chatbot’s answers substitute for visits to publisher sites, siphoning audience and revenue while spreading errors under a trusted name.
- Inside the complaint alleging pervasive AI infringement
- The trademark twist: alleged false attribution and citations
- How courts are approaching the legal landscape for AI training
- OpenAI’s likely defenses and recent licensing deals
- Why reference publishers matter in the AI training disputes
- What to watch next as the Britannica lawsuit advances
Inside the complaint alleging pervasive AI infringement
Britannica says it owns the rights to nearly 100,000 online articles and asserts that OpenAI scraped them to train large language models without a license. The complaint claims the models can reproduce “full or partial verbatim” passages from encyclopedia entries and dictionary definitions, including structured examples and usage notes that Britannica and Merriam-Webster regard as protectable editorial expression.
Beyond training, the publisher points to OpenAI’s RAG features, which pull from the web or internal databases to answer recent queries. Britannica argues this system routinely draws on its content without authorization and presents it in ways that deter clicks through to the source, directly competing with its reference products and advertising-supported pages.
The trademark twist: alleged false attribution and citations
Unusually for AI copyright disputes, Britannica adds a Lanham Act claim, asserting that ChatGPT sometimes hallucinates articles or quotes and attributes them to Britannica or Merriam-Webster. That, the suit says, confuses users about the origin and accuracy of information, risking reputational harm for brands built on reliability.
Hallucinated citations have emerged as a recurring problem across generative AI, with documented cases of fabricated case law and misquoted sources. If a court finds that falsely invoking a publisher’s name in AI output constitutes false designation of origin, it could broaden the legal exposure for model providers well beyond copyright.
How courts are approaching the legal landscape for AI training
U.S. law remains unsettled on whether training models on copyrighted text is fair use. In a notable ruling, a federal judge in litigation involving Anthropic indicated that using works as training data can be transformative, but also faulted the company for allegedly acquiring millions of books unlawfully, a dispute that culminated in a reported $1.5 billion class settlement for authors. That split—transformative use versus unlawful sourcing—now shapes how courts parse AI cases.
The Copyright Office has been studying generative AI and has reiterated that facts and short phrases aren’t protected, but curated selection, arrangement, and expressive phrasing can be. Dictionaries and encyclopedias sit at that boundary: definitions summarize factual meaning, yet their wording, examples, and editorial taxonomy reflect creative judgment that publishers claim is protectable.
OpenAI’s likely defenses and recent licensing deals
OpenAI is expected to argue fair use, contend that any excerpts are incidental or user-prompted, and point to safety and attribution features intended to reduce errors. The company has stressed industry partnerships, striking content agreements with organizations such as the Associated Press, Axel Springer, the Financial Times, and News Corp to license archives and enable news summaries.
Britannica’s case turns on the allegation that no comparable license exists for its corpus and that RAG-enabled answers function as substitutes rather than pointers. If a court agrees, it could pressure AI firms to expand licensing for high-value reference works or redesign how models surface and credit sources.
Why reference publishers matter in the AI training disputes
Encyclopedias and dictionaries provide the carefully vetted ground truth that model developers prize for accuracy. But the same qualities that make these sources useful for AI also make them business-critical for their owners. Britannica warns that chatbots reduce the incentive to visit original sites, weakening the economic engine that funds editorial updates and expert review.
The suit joins a broader wave: The New York Times, Ziff Davis, and multiple North American newspapers have brought related claims, while a separate Britannica case against Perplexity is pending. Together, these actions test whether courts will require licenses for reference-grade data, impose technical guardrails for attribution, or endorse fair use defenses for large-scale training.
What to watch next as the Britannica lawsuit advances
Early motions will likely target the fair use and Lanham Act theories, as well as whether specific outputs cited in the complaint are representative of systematic conduct. Discovery could illuminate how OpenAI sourced datasets, how RAG selects and quotes material, and what controls exist to prevent false citations to named publishers.
Outcomes range from a negotiated license to injunctive limits on reproducing or attributing reference content. However it lands, the case will shape the economics of factual knowledge in the AI era and set expectations for provenance, attribution, and the line between learning from a text and replacing it.