David Silver’s new AI bet: Learning from experience, not just human data and Internet

Introduction

The launch of David Silver’s heavily funded London AI lab, Ineffable Intelligence, is one of the most interesting signals in the current AI race. Silver is not just another founder entering the frontier AI market. He is one of the central figures behind AlphaGo and AlphaZero, systems that showed the world how machines could reach superhuman performance through reinforcement learning, self-play, search, and simulation. His new company has reportedly raised $1.1 billion at a valuation of about $5.1 billion, with the ambition of building AI systems that learn from their own experience rather than relying mainly on human-generated internet data.

This is a serious alternative bet to the dominant large language model paradigm. Today’s frontier AI companies mostly scale models by feeding them enormous quantities of text, code, images, videos, and human feedback. Silver’s thesis is different: truly advanced AI may need to become a “superlearner” - a system that can explore environments, try actions, receive feedback, improve strategies, and discover knowledge by doing. WIRED reports that Silver sees reinforcement learning as a path toward systems that generate their own intelligence rather than imitate human intelligence.

But the correct conclusion is not that LLMs are dead. It is that the AI race is widening. LLMs have proved astonishingly useful for language, coding, reasoning assistance, content generation, tool use, and knowledge work. However, their dependence on human-produced data creates a ceiling: they are powerful at absorbing and remixing what humans have already written, but less naturally suited to discovering entirely new strategies through action. Silver’s new lab matters because it challenges the assumption that scaling internet-trained models is the only path to frontier intelligence.

Let's dive deeper into this.

1. David Silver represents the AlphaGo tradition, not the ChatGPT tradition

David Silver’s reputation comes from a different branch of AI history. AlphaGo and AlphaZero were not primarily language models. They were systems that learned to make decisions through reinforcement learning, self-play, and search. AlphaZero, for example, was designed to learn chess, shogi, and Go from self-play, starting with the rules rather than relying on human expert games.

This distinction matters. The LLM tradition is based on predicting and generating language from massive datasets. The AlphaGo tradition is based on agents improving through experience. In games, this means playing millions of games against oneself. In broader AI, the challenge is to create environments where an AI can test ideas, take actions, learn consequences, and improve. Silver’s new lab appears to be an attempt to take the AlphaZero idea beyond board games and into general intelligence.

2. The bet is that intelligence comes from interaction, not only imitation

Large language models are trained largely by learning patterns from human-created data. That is extremely powerful because human language contains compressed knowledge about science, culture, logic, law, medicine, business, and everyday life. But it is still, in a deep sense, imitation learning at scale. The model learns what humans have said, written, coded, and corrected.

Silver’s alternative bet is that stronger intelligence may come from interaction with environments. A system that experiments, fails, receives feedback, adjusts, and tries again may eventually learn things that are not already present in human text. This is why reinforcement learning, simulation, and self-play are so important: they allow an AI system to generate fresh learning signals rather than only consuming existing human data.

3. Human internet data may become a bottleneck

The current LLM paradigm depends heavily on vast quantities of human-generated data. But there are growing concerns about data limits, data quality, copyright issues, synthetic data contamination, and diminishing returns from simply scaling model size. Once models have absorbed most high-quality public text and code, the next improvement may not come easily from “more internet.”

Silver’s approach directly targets that bottleneck. If an AI can create its own curriculum of experiences, then learning is no longer limited to what humans have already written online. In theory, it could explore mathematics, science, engineering, robotics, economics, or strategy through controlled environments and simulations. That is the promise: not just a model that reads the world, but a system that practices acting in the world.

4. Self-play is powerful because it creates endless competition

Self-play is one of the most important ideas behind AlphaGo and AlphaZero. A system improves by competing against versions of itself. Each improvement creates a stronger opponent, which forces the system to improve again. This creates a loop of escalating capability.

In games, this is straightforward because the rules are clear and the outcome is measurable: win, lose, or draw. The hard question is whether similar loops can be built for open-ended domains. Can an AI learn science by generating hypotheses, testing them in simulation, and refining them? Can it learn economics by modeling markets? Can it learn robotics by acting in virtual worlds before moving to real robots? These are difficult problems, but they are exactly the kinds of problems Silver’s lab seems designed to attack.

5. Simulation could become the new training data

If internet text is the fuel of LLMs, simulation may become the fuel of experience-driven AI. A simulation gives an AI a space where it can act, fail safely, receive feedback, and repeat millions or billions of times. This is crucial because real-world experience is slow, expensive, and risky. A robot cannot crash a car millions of times in the real world, but it can do so in simulation. A scientific agent cannot run infinite physical experiments, but it may be able to test ideas in computational models.

The quality of the simulation will be decisive. If the simulation is too simple, the AI may learn tricks that do not transfer to reality. If the simulation is rich and accurate, it could produce capabilities that generalize. This is one of the biggest open questions for Silver’s approach: can we build simulated worlds complex enough to train genuinely general intelligence?

6. Reinforcement learning is not new, but the timing is different

Reinforcement learning has been studied for decades. What is new today is the combination of compute, neural networks, simulation infrastructure, and frontier AI talent. Earlier reinforcement learning systems often struggled with sample efficiency, sparse rewards, unstable training, and transfer to real-world settings. These challenges have not disappeared.

However, the environment has changed. AI labs now have far more compute, better architectures, stronger world-modeling techniques, more advanced simulators, and growing experience in training agents. The question is not whether reinforcement learning works - it clearly does in some domains. The question is whether it can scale from games and constrained environments to open-ended, general-purpose intelligence.

7. This is not proof that LLMs are obsolete

The most important point is balance. Silver’s new lab is a serious alternative bet, but it is not evidence that LLMs are finished. LLMs remain the most commercially successful and widely deployed AI systems today. They are useful because language is the operating system of human knowledge. They can write, explain, summarize, translate, code, reason, plan, and operate tools.

It is more likely that future frontier systems will combine paradigms. LLMs may provide language, interface, reasoning, and knowledge compression. Reinforcement learning may provide action, adaptation, and goal-directed improvement. Simulation may provide training environments. Search and planning may provide strategic depth. The future may not be LLM versus RL, but LLM plus RL plus simulation plus tools plus memory plus planning.

8. The real race is between passive learning and active learning

The deeper contrast is not between two technical labels. It is between passive learning and active learning. LLMs mostly learn passively from data that already exists. Experience-driven agents learn actively by doing. Humans use both. We learn from books, teachers, and language, but we also learn by acting, making mistakes, practicing skills, and testing reality.

For AI to become more autonomous, it may need the second form of learning. A model that only reads about chemistry is different from a system that can design experiments, predict outcomes, run simulations, revise its theory, and propose better experiments. A model that reads code is different from an agent that writes code, tests it, debugs it, benchmarks it, and improves its own methods. This active loop is where Silver’s approach becomes important.

9. The biggest risks are alignment, evaluation, and control

A system that learns from experience can become more capable, but also harder to predict. If an AI is trained to maximize rewards in complex environments, it may discover unexpected strategies. In games, this can be brilliant. In the real world, it can be dangerous. Reward design, evaluation, interpretability, containment, and alignment become central issues.

This is why experience-driven AI must be developed carefully. The more an AI system learns autonomously, the more important it becomes to understand what it is optimizing, what environments it is trained in, what behaviors are rewarded, and whether its learned strategies transfer safely. Silver’s vision is ambitious, but ambition in AI must be matched by governance and safety discipline.

10. The strategic importance is that AI now has multiple credible futures

The launch of Ineffable Intelligence shows that the frontier AI race is no longer one-dimensional. For the past few years, the public conversation has often treated bigger LLMs as the main road to advanced AI. Silver’s lab suggests that some of the world’s leading AI researchers believe the next breakthrough may come from a different direction.

This matters for companies, policymakers, educators, and investors. If AI progress comes only from LLM scaling, then the winners are likely to be those with the most data, compute, and distribution. But if experience-based learning becomes central, then simulation platforms, robotics environments, scientific modeling tools, reinforcement learning infrastructure, and agent evaluation systems may become equally important. The AI ecosystem could become broader, more competitive, and more technically diverse.

Conclusion

David Silver’s new London AI lab is important because it challenges the dominant assumption behind today’s AI boom: that the road to frontier intelligence is mainly paved with bigger language models trained on more human data. His bet is that future AI systems must learn more like agents—through experience, self-play, simulation, reinforcement, and discovery.

But this should not be confused with a declaration that LLMs are obsolete. LLMs are still powerful, practical, and central to the current AI revolution. What Silver’s move really tells us is that the next phase of AI may not be about choosing between language models and reinforcement learning. It may be about combining them.

The strongest future systems may use language models to understand and communicate, reinforcement learning to improve through action, simulation to generate experience, planning to reason over long horizons, and tools to interact with the world. Silver’s new lab is not the end of the LLM era. It is a sign that the AI race is entering a more interesting phase - one where machines may not only learn from what humans have written, but also from what they can discover for themselves.

[The Billion Hopes Research Team shares the latest AI updates for learning and awareness. Various sources are used. All copyrights acknowledged. This is not a professional, financial, personal or medical advice. Please consult domain experts before making decisions. Feedback welcome!]

Insights - Billion Hopes

Header$type=social_icons

David Silver’s new AI bet: Learning from experience, not just human data and Internet

Introduction

Let's dive deeper into this.

1. David Silver represents the AlphaGo tradition, not the ChatGPT tradition

2. The bet is that intelligence comes from interaction, not only imitation

3. Human internet data may become a bottleneck

4. Self-play is powerful because it creates endless competition

5. Simulation could become the new training data

6. Reinforcement learning is not new, but the timing is different

7. This is not proof that LLMs are obsolete

8. The real race is between passive learning and active learning

9. The biggest risks are alignment, evaluation, and control

10. The strategic importance is that AI now has multiple credible futures

Conclusion

Categories:

WELCOME TO OUR YOUTUBE CHANNEL $show=page

🎯 AI Power of 10 & Strategic Review

/fa-check-square/ FEATURED POST

David Silver’s new AI bet: Learning from experience, not just human data and Internet

/fa-book/ SUBSCRIBE AI NEWSLETTER

/fa-heart/ VISITORS ON INSIGHTS

AI & JOBS$type=list-tab$date=1$au=0$com=0$count=7

AI & DATA$type=list-tab$date=1$au=0$com=0$count=7

GEN-AI & LLMs$type=list-tab$date=1$au=0$com=0$count=7

/fa-eye/ MOST READ POSTS

Search this site

BE OUR CHANNEL PARTNER

JOIN HANDS WITH US

JOIN NEWSLETTER

TESTIMONIAL

SOCIAL MEDIA

PROFESSIONAL AI RESOURCES

ACADEMY COURSES

INSIGHTS ON AI

100 AI FAQs

YOUTUBE CHANNEL