The Chicago Tribune has sued Perplexity, accusing the artificial intelligence search company of scraping and republishing its journalism without paying for it — and of even using paywalled stories to train its machine-learning systems.
The complaint says that the publisher’s lawyers inquired whether Squeeze and Perplexity used Tribune content; the company’s attorneys purportedly said it doesn’t train its models on the paper’s work but may surface non-verbatim factual summaries. The Tribune disagrees, saying Perplexity’s content is virtually word-for-word extracted from scraped stories.

The filing also goes after a product developed by Perplexity called retrieval-augmented generation, or RAG, arguing that the system relies on Tribune articles as a live data source and that the company’s Comet browser does an “end run around paywalls” to generate detailed write-ups. If proved, the case would not just challenge model training procedures, but also how AI services consume and serve copyrighted news at query time.
What the Chicago Tribune lawsuit against Perplexity alleges
Perplexity purports to users that it “synthesizes information responsibly,” despite actually lifting excerpts and structured information from Tribune reporting, the complaint says. The newspaper says its stories are being reproduced in ways that directly compete with the original articles, including behind-the-paywall content that is a cornerstone of subscription value.
The Tribune contends that Perplexity’s output, and the pipelines feeding it, are based on unlawful scraping and redistribution. By singling out RAG, the suit draws a bright line from training data controversies to the newer question of whether feeding copyrighted text into an AI system at inference time adds new legal risk.
RAG puts new legal questions in play for AI services
RAG is aimed at eliminating hallucination by retrieving documents from reliable sources and anchoring the model’s prediction to the contents of these materials. In practice, this frequently entails copying salient text into the model’s context window where it can guide the output generation. That mechanism raises new questions: Even if a developer steers clear of training on copyrighted works, feeding on protected text in real time could violate reproduction or derivative-use claims.
Legal scholars and industry groups have pointed out that courts have not weighed in fully on RAG’s liability profile. Previous high-profile lawsuits against AI companies have focused on model training and the “regurgitation” of publisher content. This case may serve as an early litmus test over whether RAG’s architecture alters the fair use calculus compared to traditional web search and snippet display.
Paywall scraping and potential legal trouble
The paywall allegations of the complaint may raise the stakes. And when a tool or crawler has defeated technical measures and scraped subscriber content, publishers usually invoke the DMCA’s anti-circumvention provisions or (in some cases) computer misuse laws. Separately, while robots.txt is not the law, ignoring access controls and terms can multiply legal and reputational risk.
This fight comes on the heels of previous reporting from multiple outlets that AI services have also been scraping text from behind paywalls in investigations and sensitive scoops. Perplexity had earlier said that it works to honor publisher controls and that it doesn’t intentionally train on or replicate protected content. The Tribune’s lawsuit will probably turn on a smoky, eye-shaded plinth of technical evidence, including:

- Crawler identities and headers
- Server logs and content delivery network (CDN) logs
- Cache records
- Outputs compared with side-by-side source text
Why the outcome matters to news publishers and AI
Local and regional journalism increasingly depends on subscription and licensing for financial support in order to do reporting. Newspaper newsroom employment, the Pew Research Center found, has declined by more than half since the late 2000s, which helps explain how sensitive publishers are to unlicensed reproduction that takes place instead of a visit, a pageview, or access to a paying newspaper subscriber.
Meanwhile, companies that offer generative AI technology are signing licensing agreements with large publishers and wire services to minimize legal risk while enhancing the quality of content. Deals struck by the largest AI developers with organizations like The Associated Press, Axel Springer, the Financial Times, and News Corp hint at a price for access. The Tribune’s case leaves companies that depend on live retrieval to either pay a licensing fee, alter product construction, or try to make such demands stand up in court.
Part of a broader pushback on publishing
The Tribune is part of a larger consortium of newspapers from MediaNews Group and Tribune Publishing that has already sued the top AI model makers and their cloud partner over modeling on publisher archives. Those suits mostly center on training data. The Perplexity action, by contrast, homes in on RAG pipelines, scraping activity, and paywall access — broadening the legal front.
Other news organizations, including a leading national daily, have made similar claims that AI systems replicate articles too faithfully and lead to the cannibalization of subscriptions. Regulators and policymakers are also scrutinizing how AI summaries affect traffic to original sources, a dynamic that is important for both consumer choice and the economics of news.
What to watch next in the Chicago Tribune lawsuit
Anticipate early skirmishes over how logs will be preserved, the scope of discovery into Perplexity’s RAG architecture, and whether the court orders any restrictions on working with paywalled URLs.
Technical forensics will be key: which crawlers touched what, how robots directives and paywall checks were respected, and whether outputs teetered on the wrong side of the line from summarization to substitution.
The case could help clarify whether AI-driven retrieval engines will be categorized as search, a fair use of copyrighted works, or as unlicensed republication of the works. A ruling on RAG could have industrywide reverberations, driving developers toward more explicit licensing, tighter retrieval controls, or redesigned products that make links and attribution prominent enough to sustain the original reporting.