"Stolen's a strong word. It's copyrighted content that the owner wasn't paid for. So yes." - Bill Gates, Co-founder of Microsoft
How to train LLMs after all
India's Department for Promotion of Industry and Internal Trade (DPIIT) in December 2025 introduced a working paper recommending a significant legislative framework for the use of copyrighted content in artificial intelligence (AI) and large language model (LLM) training. The proposal describes a "hybrid model" to grant developers full access to content necessary for Text Data Mining (TDM) while ensuring copyright holders receive statutory remuneration. The ideas is to balance the rapid evolution of the generative AI ecosystem with the fundamental rights of creators.
Fixed royalty rates
Under the proposal, AI developers would be mandated to pay royalties to content creators for using their copyrighted data. These rates would be determined either by the government or by a court, ensuring fair compensation. The framework seeks to establish a quick, transparent process for affected parties to challenge predetermined royalty rates. The objective is to foster both innovation for big startups and established players, and reward human creativity. All this sounds good in theory, but can completely fail when rubber meets the road!
Royalty collection body
To streamline the complex process of royalty collection and distribution, the DPIIT committee proposed setting up a Copyright Royalties Collect (CRCAT). This body is envisioned as a non-profit organization, likely established by an industry association. The CRCAT would function as a centralized facilitator, collecting fees from AI developers and distributing the proportional remuneration to the relevant rights holders, ensuring operational ease.
Expert skepticism
While the framework intends to simplify licensing, some experts have voiced concern that a mandatory collective mechanism for fair compensation is flawed. Experts prefer compensation to be prompt, practical, and accurately administered, arguing that the suggested mechanism could lead to compliance issues, difficulty in tracking data sources, and significant issues if implemented in its current form. They suggest that the hybrid model’s reliance on a regulatory mechanism for financial matters might be overly burdensome.
Giving access to everything!
The framework is being championed by the DPIIT and the Ministry of Electronics and Information Technology (MeitY) as a necessary step to clear the legal roadmap for accessing protected works for AI training. This step moves beyond relying solely on negotiations or uncertain court rulings toward a standardized licensing regime. It ensures that the regulatory load is predictable and uniform for all developers operating in India’s growing AI industry.
Summary
India’s DPIIT has proposed a "hybrid model" legislative framework requiring AI developers to pay mandatory, government-fixed royalties to content creators for using copyrighted material in model training. This plan aims to balance AI innovation with copyright protection and suggests creating a non-profit body (CRCAT) to manage the collection and distribution of this remuneration.Food for thought
If mandatory royalty collection is adopted, will the centralized CRCAT be able to fairly and accurately track the use of content across massive, continually changing AI training datasets?AI concept to learn: Text Data Mining
Text data mining (TDM) is the automated process of analyzing vast amounts of text to extract high-quality information, identify patterns, and generate new knowledge. For AI models like LLMs, TDM is essential as it involves copying and processing text data to understand language structure and facts without intending to reproduce the expressive work itself. The legal debate centers on whether this technical use of copyrighted text falls under existing "fair use" exceptions or necessitates a new statutory licensing regime for remuneration.
[The Billion Hopes Research Team shares the latest AI updates for learning and awareness. Various sources are used. All copyrights acknowledged. This is not a professional, financial, personal or medical advice. Please consult domain experts before making decisions. Feedback welcome!]

COMMENTS