/* FORCE THE MAIN CONTENT ROW TO CONTAIN SIDEBAR HEIGHT */ #content-wrapper, .content-inner, .main-content, #main-wrapper { overflow: auto !important; display: block !important; width: 100%; } /* FIX SIDEBAR OVERFLOW + FLOAT ISSUES */ #sidebar, .sidebar, #sidebar-wrapper, .sidebar-container { float: right !important; clear: none !important; position: relative !important; overflow: visible !important; } /* ENSURE FOOTER ALWAYS DROPS BELOW EVERYTHING */ #footer-wrapper, footer { clear: both !important; margin-top: 30px !important; position: relative; z-index: 5; }

India may allow full access to content for AI training

"Stolen's a strong word. It's copyrighted content that the owner wasn't paid for. So yes." - Bill Gates, Co-founder of...

"Stolen's a strong word. It's copyrighted content that the owner wasn't paid for. So yes." - Bill Gates, Co-founder of Microsoft

How to train LLMs after all

India's Department for Promotion of Industry and Internal Trade (DPIIT) in December 2025 introduced a working paper recommending a significant legislative framework for the use of copyrighted content in artificial intelligence (AI) and large language model (LLM) training. The proposal describes a "hybrid model" to grant developers full access to content necessary for Text Data Mining (TDM) while ensuring copyright holders receive statutory remuneration. The ideas is to balance the rapid evolution of the generative AI ecosystem with the fundamental rights of creators.

Fixed royalty rates

Under the proposal, AI developers would be mandated to pay royalties to content creators for using their copyrighted data. These rates would be determined either by the government or by a court, ensuring fair compensation. The framework seeks to establish a quick, transparent process for affected parties to challenge predetermined royalty rates. The objective is to foster both innovation for big startups and established players, and reward human creativity. All this sounds good in theory, but can completely fail when rubber meets the road!

Royalty collection body

To streamline the complex process of royalty collection and distribution, the DPIIT committee proposed setting up a Copyright Royalties Collect (CRCAT). This body is envisioned as a non-profit organization, likely established by an industry association. The CRCAT would function as a centralized facilitator, collecting fees from AI developers and distributing the proportional remuneration to the relevant rights holders, ensuring operational ease.

Expert skepticism 

While the framework intends to simplify licensing, some experts have voiced concern that a mandatory collective mechanism for fair compensation is flawed. Experts prefer compensation to be prompt, practical, and accurately administered, arguing that the suggested mechanism could lead to compliance issues, difficulty in tracking data sources, and significant issues if implemented in its current form. They suggest that the hybrid model’s reliance on a regulatory mechanism for financial matters might be overly burdensome.

Giving access to everything!

The framework is being championed by the DPIIT and the Ministry of Electronics and Information Technology (MeitY) as a necessary step to clear the legal roadmap for accessing protected works for AI training. This step moves beyond relying solely on negotiations or uncertain court rulings toward a standardized licensing regime. It ensures that the regulatory load is predictable and uniform for all developers operating in India’s growing AI industry.

Summary

India’s DPIIT has proposed a "hybrid model" legislative framework requiring AI developers to pay mandatory, government-fixed royalties to content creators for using copyrighted material in model training. This plan aims to balance AI innovation with copyright protection and suggests creating a non-profit body (CRCAT) to manage the collection and distribution of this remuneration.

Food for thought

If mandatory royalty collection is adopted, will the centralized CRCAT be able to fairly and accurately track the use of content across massive, continually changing AI training datasets?

AI concept to learn: Text Data Mining

Text data mining (TDM) is the automated process of analyzing vast amounts of text to extract high-quality information, identify patterns, and generate new knowledge. For AI models like LLMs, TDM is essential as it involves copying and processing text data to understand language structure and facts without intending to reproduce the expressive work itself. The legal debate centers on whether this technical use of copyrighted text falls under existing "fair use" exceptions or necessitates a new statutory licensing regime for remuneration.

Copyright content available for all LLMs

[The Billion Hopes Research Team shares the latest AI updates for learning and awareness. Various sources are used. All copyrights acknowledged. This is not a professional, financial, personal or medical advice. Please consult domain experts before making decisions. Feedback welcome!]

COMMENTS

Loaded All Posts Not found any posts VIEW ALL READ MORE Reply Cancel reply Delete By Home PAGES POSTS View All RECOMMENDED FOR YOU LABEL ARCHIVE SEARCH ALL POSTS Not found any post match with your request Back Home Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sun Mon Tue Wed Thu Fri Sat January February March April May June July August September October November December Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec just now 1 minute ago $$1$$ minutes ago 1 hour ago $$1$$ hours ago Yesterday $$1$$ days ago $$1$$ weeks ago more than 5 weeks ago Followers Follow THIS PREMIUM CONTENT IS LOCKED STEP 1: Share to a social network STEP 2: Click the link on your social network Copy All Code Select All Code All codes were copied to your clipboard Can not copy the codes / texts, please press [CTRL]+[C] (or CMD+C with Mac) to copy Table of Content