A group of YouTube creators has filed a proposed class action accusing Snap of using their videos without permission to train commercial AI features, including the app’s Imagine Lens. The lawsuit, lodged in the U.S. District Court for the Central District of California, alleges Snap tapped research-only video datasets sourced largely from YouTube and circumvented the platform’s technical and contractual safeguards to build revenue-generating AI tools.
The case is led by the team behind the h3h3 channel, alongside MrShortGame Golf and Golfholics, which together have roughly 6.2 million subscribers. The same creators previously brought similar suits against Nvidia, Meta, and ByteDance, reflecting a broader campaign by online publishers and artists to challenge unlicensed data use in AI training.
Allegations Center on Research-Only Video Datasets
At the core of the complaint is HD-VILA-100M, a large-scale video-language dataset designed for academic research. Plaintiffs claim Snap, to power features like text-prompted edits, used HD-VILA-100M and comparable corpora for commercial purposes despite license language and common academic norms restricting such use.
The suit further asserts Snap sidestepped YouTube’s terms of service, which prohibit scraping and commercial reuse without authorization, as well as technological measures that control access to videos and metadata. If proven, those assertions could intersect with anti-circumvention provisions under the Digital Millennium Copyright Act—an increasingly common add-on in AI training disputes.
Why these datasets matter: multimodal systems learn by aligning video frames with text, enabling models to understand scenes, actions, and context. HD-VILA-100M and similar sets are attractive because they provide rich, paired examples at scale. The legal question is whether moving such data from a research context into a commercial pipeline crosses the line into infringement.
What Snap’s AI Features Do and How They Work
Snap has leaned into generative and augmented reality features to keep users engaged. Imagine Lens, the product highlighted in the complaint, lets users transform or stylize images using short text prompts—functionality typically backed by models trained on vast collections of image and video data paired with captions or transcripts.
The creators argue their videos helped teach these systems how to recognize content and produce edits, yet they were never asked for permission or paid. In their telling, Snap turned research datasets into commercial fuel without honoring licenses or the platform rules governing YouTube content.
The Wider Fight Over Training Data in AI Models
This case lands amid a wave of lawsuits targeting AI model training practices by authors, artists, news outlets, and user-generated platforms. The Copyright Alliance has tracked more than 70 copyright actions tied to AI training and outputs, underscoring how unsettled the legal landscape remains.
Outcomes have been mixed. In an author suit against Meta, a judge sided with the company on key claims, while authors suing Anthropic reported a settlement. Many cases continue to test whether intermediate copying for training is fair use, how derivative work theories apply to model outputs, and what duties companies have to honor research-only dataset restrictions.
To mitigate risk, some AI developers have struck licenses—deals involving Shutterstock and the Associated Press are prominent examples—signaling a shift toward consent-based sourcing. Creators say platforms hosting their work should follow suit or provide opt-in mechanisms with auditability and robust provenance controls.
What the Creators Seek in Their Lawsuit Against Snap
The plaintiffs are asking for statutory damages and a permanent injunction halting the alleged infringement. For registered works, statutory damages can reach up to $150,000 per work for willful infringement, a figure that can scale quickly for channels with extensive archives.
The court could also face calls—common in similar suits—for orders requiring deletion of infringing data, retraining or disabling affected models, and transparency about training pipelines and vendors. Expect detailed discovery requests around dataset provenance, license terms, and any steps Snap took to filter or vet sources.
Why It Matters for Platforms and Creators
Beyond Snap, the case probes a key fault line for the creator economy: whether platforms and AI makers must license video content explicitly for training, or whether public availability and platform terms can stand in as permission. A ruling could ripple across social media companies that increasingly rely on generative features to compete.
Regardless of the outcome, the pressure is mounting for AI teams to document consent, honor research-only licenses, and provide mechanisms for exclusion and compensation. For creators, the case is both a bid for accountability and a push to shape the rules of how their work trains the next generation of AI.