“When a language dies, a way of understanding the world dies with it.” - Noam Chomsky, Linguist and Cognitive Scientist
AI’s limits in saving languages
Artificial intelligence has transformed communication, but it struggles with preserving endangered languages. According to the United Nations, about 40% of the world’s 7,000 languages are at risk of extinction. Generative AI tools, while powerful in translation and multilingual tasks, still fail to effectively support “low-resource languages” that lack digital data and representation.
The data problem at the heart of AI
AI models depend heavily on vast amounts of training data, and most of this data is in English. This imbalance causes major language models to perform poorly in less common languages, limiting their usefulness. The lack of accurate, context-rich data not only excludes speakers of these languages but also homogenizes global communication, erasing local culture and diversity.
The risks of linguistic inequality
Low-resource languages also expose AI’s security gaps. Studies have shown that safety filters in tools like ChatGPT can be bypassed more easily in underrepresented languages, leading to misuse. As AI becomes embedded in more sectors, education, health, and governance, these blind spots deepen digital inequality and cultural loss.
Community-driven preservation
Technology alone cannot fix this crisis. Community-led projects in Asia and New Zealand are showing promise. Efforts like Singapore’s SeaLion and New Zealand’s Māori-language databases collect localized, culturally sensitive data and keep it under community control. This ensures accuracy and prevents exploitation by big tech firms.
A path forward
Sustainable preservation of dying languages requires collaboration between AI developers and local communities. Without shared ownership and authentic data, AI risks not saving but accelerating the disappearance of human linguistic heritage.
Summary
AI alone cannot save endangered languages due to data scarcity, cultural bias, and linguistic inequality. Real progress depends on community-driven data collection, collaboration with native speakers, and equitable access to technology that reflects true cultural contexts.
Food for thought
Can AI truly represent humanity if it speaks only the languages of power?
AI concept to learn: Multilingual Models
Multilingual models are AI systems trained to understand and generate text across multiple languages. They rely on diverse data sources to capture linguistic nuances but often struggle with underrepresented languages due to limited training material. Improving these models requires both technical innovation and inclusive data practices.
[The Billion Hopes Research Team shares the latest AI updates for learning and awareness. This is not a professional, financial, personal or medical advice. Please consult domain experts before making decisions. Feedback welcome!]

COMMENTS