A new Paper from Stanford University researchers has found that in order to compete better with each other, LLMs are losing their value alignment. This is the "Moloch's Bargain". The paper's key findings are listed below, but two definitions first:
AI Alignment
AI alignment is the process of designing artificial intelligence systems
whose goals, behaviors, and decision-making reliably reflect human
values, ethics, and intentions. It ensures that advanced AI acts in ways
beneficial to humanity, avoiding unintended harm, misaligned
optimization, or outcomes contrary to collective well-being, even as
systems gain autonomy and intelligence.
Moloch’s Bargain
Moloch’s Bargain describes a destructive dynamic where rational agents,
driven by competition, make choices that harm collective interests. Each
participant optimizes for individual gain, yet all lose in the
aggregate. Applied to AI, it warns that uncoordinated races for progress
may sacrifice safety, ethics, and long-term human flourishing
Here are 10 key takeaways
1. The “Moloch” metaphor represents competitive pressure
The paper uses Moloch (a mythic figure representing destructive competition) to describe how individual rational actions can lead to collectively harmful outcomes, a modern tragedy of the commons. Systems reward competition even when cooperation would yield better overall results.
2. "Alignment Failure" emerges from structural incentives
AI misalignment is not only a technical problem; it’s deeply tied to socio-economic structures. When institutions or corporations face competitive pressure, they may deploy unsafe or unaligned AI to stay ahead, creating systemic risk.
3. Collective action problems are central
The study emphasizes that coordination failures among AI actors - governments, companies, or researchers - are the biggest barrier to safety. Even if everyone knows the risks, no one wants to be the first to slow down.
4. Rationality does not guarantee good outcomes
The author argues that rational optimization within misaligned systems leads to rational destruction—each agent is rational, but the system outcome is irrational and harmful.
5. AI amplifies preexisting incentive problems
AI systems make the “Moloch problem” worse because they accelerate optimization and reduce friction, magnifying any misalignment between private and collective goals.
6. Technological control is not enough
Purely technical solutions - like alignment algorithms or corrigibility - cannot resolve Moloch’s dynamic unless paired with institutional and governance changes.
7. Cooperation requires institutional design
To escape the “bargain,” humanity needs mechanisms for credible commitment and enforcement, such as binding treaties, regulation, and transparency norms that make safe behavior collectively rational.
8. Coordination can be fragile and costly
The paper points out that coordination mechanisms are costly to maintain, and small breakdowns (political, economic, or informational) can lead to rapid reversion into competitive dynamics.
9. Alignment research must include political economy
AI alignment must integrate economics, sociology, and governance theory - not just machine learning - since misalignment emerges from incentive misstructures at the human and institutional level.
10. Escaping Moloch requires value reorientation
Ultimately, the author calls for cultural and ethical change: moving from “win-at-all-costs” competition to valuing cooperative, sustainable progress, a “moral realignment” parallel to technical alignment.
Read and download full report/paper -

COMMENTS