Meta is trying to convince a federal court to reject the lawsuit over torrenting thousands of porn videos for generative AI creation, alleging that the flagged downloads were isolated and made by people, not teams, and were not for algorithm training. Regarding a motion to dismiss, the company calls the allegations “unsubstantiated” and claims that the plaintiffs have not proven that the pirated files were at all linked to models like Llama or the company’s video-generation research.

The company maintains that while the activity has been going on as far back as 2018, which predated the company’s growth in multimodal and video generative systems, it has also been sporadic, citing what it characterizes as fragmented, scatterbrained conduct notably outside the time series of a perfectly functioning data-ingestion pipeline. Meta also highlights its content policies and security measures. Ultimately, it says, it does not want adult-related material in its training data. Meta adds that it focuses on ensuring that adult material does not ever find its way through.

Table of Contents

Plaintiffs cite nearly 3,000 downloads tied to Meta IPs

Questions around IP attribution on shared networks

Meta says activity doesn’t match modern training pipelines
Adult content policies and industry practices in focus
What happens next in the Meta torrenting lawsuit
Broader implications for AI data provenance and safety

A close-up, professionally enhanced image of an alpaca with a white head and neck, and a brown body, looking directly at the viewer. The background is the original, slightly blurred natural setting.

Finally, the claim is based on a filing by adult film company Strike 3 Holdings and Counterlife Media. The firms allege that they tracked almost 3,000 downloads of pirated videos to obfuscated IP addresses related to Meta. The firms are demanding approximately $359 million in compensation plus a permanent injunction circumscribing the use of their products in AI models.

Plaintiffs cite nearly 3,000 downloads tied to Meta IPs

Strike 3 is a well-known copyright litigant that has filed thousands of BitTorrent cases in U.S. courts. Its investigative model is primarily based on monitoring swarms, matching file hashes, and attributing activity to network endpoints. However, digital rights groups and academics have extensively warned that IP attribution in shared or corporate networks, especially when even proxies, VPNs, or cloud infrastructure are used, is usually prone to miscalculation.

Questions around IP attribution on shared networks

In its motion, Meta leverages that suspicion that the plaintiffs will fare poorly, contending that the complaint lacks a bridge from observed downloads to its AI research teams. The plaintiffs refer to Meta’s work with a video-generation tool—referred to in the complaint as Movie Gen—and Llama-based research.

Meta says activity doesn’t match modern training pipelines

Meta claims that the scale, timing, and even the pattern of IPs offered in their complaint do not suit the data-engineering reality of training current foundation models, experiment-ready learning versions bounded to standardized datasets, documentation, and reproducible-grade versions. Foundation versions are trained on vast corpora; for example, Llama 2 was recently trained on around 2 trillion tokens, and up-to-date video models that capture motion, arrangement, and temporal context can require millions of clips.

Against such a backdrop, a few thousand files managed over many years—if that is genuine—would be insufficient for production-grade experiments and hard to operationalize without substantial preprocessing and metadata.

A white and brown llama with shaggy wool lies on a vibrant green grassy field, looking to the left.

Adult content policies and industry practices in focus

The adult content angle also opens up additional risk vectors: tighter content filters, considerations of audience age-appropriateness, and client ratings. For example, the stock libraries that license content to AI developers often have agreements barring explicit material, and platforms invest in explicit classifiers to ensure that NSFW examples do not make their way into training libraries. Meta claims to avoid training on such data—an approach that matches best industry practices.

What happens next in the Meta torrenting lawsuit

Depending on the court’s decision, Meta’s motion may be granted, and the case will be resolved without discovery. Alternatively, the case can advance to a phase where detailed network records, data provenance records, and internal Meta documentation will be produced in discovery—exactly the evidence that will decide whether adult videos were ever involved in a research pipeline.

Detailed network records
Data provenance records
Internal Meta documentation

A plaintiff’s task is to prove a tight chain of connections from an outside IP address to the training pipeline; Meta’s job is to demonstrate its policy controls and audit trails sufficient to show that this never happened.

Broader implications for AI data provenance and safety

No matter the outcome, the lawsuit also serves to highlight the pressure AI companies are under to document data provenance far more carefully. What’s more, undoubtedly, more and more companies will begin adopting third-party dataset attestations, restricted ingestion policies, and more careful, granular red-teaming for NSFW leakage—efforts which may prove as valuable in a courtroom as ensuring product safety.

Over the coming weeks and months, courts will consider the factual question of whether the complaint plausibly links the alleged torrenting to Meta AI training and the legal question of whether the request for an injunction is warranted. Further, industry watchers and readers of publications like Ars Technica will be watching to see what current legal standards are set by this case and parallel litigations over books and images, as well as the current investigative reporting on training data practices by outlets like Ars Technica, all of which continue to overlap and depend on each other. The burden for evidence in these cases may be steadily rising, and so is the expectation for AI developers to be able to demonstrate, not only claim, that unsuitable content is filtered out of their models.