A judge has lifted a log preservation order issued against OpenAI in a case over its decision to close the GPT-3-based chatbot ChatGPT.
A federal magistrate judge has vacated a broad preservation demand that required OpenAI to hold on to ChatGPT output logs indefinitely, alleviating a thorny discovery burden in The New York Times copyright suit. The action whittles down a rare and high-volume data hold that OpenAI had said arose out of a threat to expose non-party user data and amp up security liabilities.

What the court changed in OpenAI’s log preservation order
The court absolved OpenAI of its obligation to maintain the logs of all ChatGPT output – except in very specific cases – reports from which are usually discarded. Existing material must be preserved and remain accessible to the parties, while OpenAI may retain logs related to individual accounts at issue in reports flagged by The New York Times. The order brings OpenAI’s record-keeping back toward its typical retention while safeguarding what it already has only for a specifically identified set of evidence.
The mandate in the first place was unusually broad. It was also designed for The Times to test its claims that OpenAI’s systems generate copyrighted material at scale. In reversing itself on the broad hold while retaining the targeted logs, the court is signaling a preference for reflective discovery — sufficient data to test out claims without turning a private company’s momentary logs into an open-ended archive.
Why the preservation battle mattered in the NYT copyright case
AI discovery is complicated not only due to volume but also sensitivity. ChatGPT scales to massive amounts of interactions; industry traffic trackers have estimated that the service receives well over a billion monthly visits, suggesting an enormous river of prompts and outputs. To put incredible resources into maintaining that flow without regard for use in litigation escalates storage costs, security exposure, and pulls in data of bystanders to the very credibility of the matter.
OpenAI said the order was overbroad and that it raised privacy concerns for non-parties. This court has already determined that ChatGPT users are non-parties, which militates against retention in the ordinary course. E-discovery standards, influenced by the Federal Rules of Civil Procedure and decisions such as Zubulake’s, stress proportionality: courts weigh relevance against burden and expense. The new rule reflects that calculus, restricting the hold to logs most likely to be material at trial.
Privacy and compliance implications of narrower log retention
Not all conversational logs will be preserved en masse, but granular privacy issues nevertheless arise. Conversations can include personal data, secret questions and identifiers. Minimizing the data minimally now, data minimization principles enshrined in regulations such as the GDPR and downright in the California Privacy Rights Act discourage companies from keeping what they need for as long as they need. A court-ordered, potential forever-hold can work at cross-purposes to those norms if not carefully scoped.

OpenAI has espoused different retention practices across its products — shorter retention windows for API abuse monitoring, and stricter defaults for enterprise offerings — and provides some users with opt-outs in training. Even with those restrictions, a complete hold would still have ensnared broad swaths of non-essential data. The court’s correction course cuts the exposure, while maintaining evidence that plaintiffs claim is crucial to proving memorization or regurgitation.
What the order change means for the New York Times lawsuit
The Times maintains that OpenAI and partners copied and used its journalism to teach models that can recreate guarded text. The saved logs provide a peek into how frequently the system throws up near-verbatim passages and what prompts it serves to offer that — a fixation for which there has been academic research on big language model memorization. Research conducted at institutions such as Stanford has indicated that repetition in training data affects how often a model will replicate certain text, leaving us to wait and see if experts consider this angle.
With the wider hold lifted, the case now rests more on evidence already gathered and certain targeted account logs to which the plaintiffs pointed. And look for the parties to step up expert discovery on model behavior, training protections like deduplication and filtering, and statistical likelihood of regurgitation. The decision will help determine how courts understand the relationship between massive training, fair use defenses and outputs that look like copyrighted works.
The real signal this ruling sends for AI-era e-discovery
Courts are grasping for AI-era discovery: models are probabilistic, logs are huge, and relevance is contextual. The decision reinforces a practical solution — demand narrowly tailored preservation relevant to the claims, not an endless effusion of logging as routine. For AI builders, it’s a reminder to do things like document retention policies, segregate sensitive data and establish precise legal holds when litigation strikes.
For publishers and rightsholders, this is also a reminder to press your opponent on specific discovery in connection with custodians, accounts, and prompts most likely to produce problematic outputs. It doesn’t shut the door on meaningful transparency, it funnels it. The recalculation by the court implies that future AI cases will likely come down to focused, technically informed e-discovery, and not blanket data freezes.