India has been considering requiring A.I. companies to pay royalties whenever they train A.I. models on copyrighted works, an idea that would directly curtail the activities of OpenAI and Google as well as similar work being done by other developers building large-scale systems fed with online content. If adopted, the framework would establish one of the world’s first statutory licensing regimes for AI training data, making India a policy outlier with global reverberations.
What India Is Proposing: A Mandatory Blanket License for AI Training
The department has issued a “mandatory blanket license” that would, in effect, guarantee Indian AI developers automatic access to copyrighted material for training purposes — provided they pay agreed royalties into a central collecting society managed by rights-holder organizations. The committee that proposed the plan maintains that such a single-window system would reduce transaction costs, save on years of litigation and guarantee creators get paid when their works are ingested at scale.
- What India Is Proposing: A Mandatory Blanket License for AI Training
- Why It’s Important for AI Companies and Model Training Costs
- How the Plan Fits Globally Against EU, UK, US, and Japan Rules
- Industry Pushback and Support From Tech and Rights Holders
- Key Questions on Implementation, Valuation, and Data Attribution
- What Comes Next: Timeline, Lobbying, and Possible Pilot Phases
Under the draft, there’d be a hybrid approach: you’d get to train on copyrighted works but with a mandatory payment to the collecting society, which would then share it among registered and unregistered creators. Members of the public have an opportunity to comment on them before they are tabled for the government’s review.
Why It’s Important for AI Companies and Model Training Costs
India is a key market for generative AI at a global level. OpenAI’s leadership has echoed the sentiment about India being one of its biggest user bases, and global platforms frequently talk about the country having hundreds of millions of internet users and developers. India had far more than 800 million broadband subscribers in 2024, thereby lending weight to what is at stake.
Mandatory royalties would introduce a new line item in training budgets already dominated by compute and data acquisition. Frontier model training runs cost more than $100 million, according to the Stanford AI Index, and any per-work or usage-based fee could significantly alter model economics, particularly for companies updating models regularly or localizing them in Indian languages and domains.
Operationally, companies may require stronger data lineage systems — reliable logs, audit trails, and provenance checks — to validate royalty calculations and show compliance. This would help drive the tendency within the industry toward curated datasets and away from indiscriminate web scraping.
How the Plan Fits Globally Against EU, UK, US, and Japan Rules
India’s approach is in contrast to other large jurisdictions. The EU’s AI Act mandates that model providers respect copyright and disclose some details about their training data, while EU copyright law offers text-and-data-mining exceptions with a rights-holder opt-out. The U.K. has explored opening up text-and-data-mining, but there has been opposition from creative industries. Japan allows for expansive text-and-data-mining regardless of purpose, and the U.S. is pushing at these limits in court, with cases brought by news publishers, authors and image libraries contesting unlicensed training.

In suggesting a statutory license, India is borrowing from paradigms more commonly associated with music and broadcasting. Collectives like the Indian Performing Right Society or the Indian Singers’ Rights Association set precedent for organizing and disbursing large-scale royalties, a different technical challenge than AI training.
Industry Pushback and Support From Tech and Rights Holders
Tech industry groups are split. Nasscom has called on the government to introduce a wide-ranging copyright exception for text and data mining, when the content is accessed legally, because “mandatory licensing could stifle innovation”. The Business Software Alliance, for its part, has also argued that licensing alone could be infeasible, highlighting that “smaller or licensed-only datasets may undermine model quality and entrench bias.”
Rights holders like news publishers, authors and artists have been asserting their right to compensation for the use of their works in AI training. The Indian news sector has also entered the legal picture with a case in the Delhi High Court as to whether training amounts to reproduction, or counts as “fair dealing.” The proposed framework is designed to bypass extended litigation by providing payment while preserving access.
Key Questions on Implementation, Valuation, and Data Attribution
Three questions will decide whether the scheme works in practice:
- Valuation mechanics: What is the base for a royalty calculation? Options include per-work inclusion; token-based proxies, sampling audits, or usage-weighted models where we consider how often a work appears or influences the outputs.
- Data attribution: The web is a dirty place. And all the duplicates, re-edits and platform reposts make attribution tough. For developers, this will require provenance tools and deduplication methods; for the collecting body, reliable rules to credit original creators.
- Scope and carve-outs: Public domain and openly licensed material are straightforward — gray areas include user-generated content on platforms, synthetic data generated from copyrighted works, and fine-tuning compared with pretraining. This must be accompanied by a balancing of commitments with respect to startups versus hyperscalers to prevent incumbents from being entrenched.
What Comes Next: Timeline, Lobbying, and Possible Pilot Phases
The government has initiated a 30-day public comment period, after which the committee is expected to finalize and submit its recommendations. Do look for an explosion of lobbying from international AI firms, Indian creator collectives and digital rights activists. A short-term test of rates, data reporting and distribution mechanisms — limited to a few sectors like news or music — could develop as the pragmatic way forward.
If the measure becomes law, India’s plan would put a price on access to copyrighted material, influencing policy debates from Washington to Brussels. For creators, it offers the prospect of a new revenue stream. For A.I. companies, it puts a price tag on something that until now many had treated as an optional cost — a sign that the era of “train first, buzz later” may be coming to an end.