In a recent discussion on Google DeepMind’s YouTube channel, Oriol Vinyals—Vice President of (Drastic) Research and co-lead for Gemini—sat down with Professor Hannah Fry to explore how AI agents have progressed from narrow single-task systems into broad, multimodal tools capable of remarkable autonomy. Their wide-ranging conversation touched on everything from early breakthroughs in game-playing systems to the emerging world of “digital bodies” and advanced reasoning in next-generation AI. Below is an overview of the key themes and insights from their talk.
When DeepMind first drew headlines, it was mainly for agents excelling at singular tasks: Atari video games, the strategy game StarCraft, and of course, the iconic AlphaGo. Each of these agents used the same high-level formula—pretraining on historical human data (or game positions), followed by post-training with reinforcement learning (playing against itself) to maximize its skill. But these early agents were narrow in scope. AlphaStar, for instance, was dedicated solely to StarCraft.
According to Vinyals, that same two-step pipeline—pretraining (imitation) and post-training (reinforcement)—remains at the core of modern systems like Gemini 2.0. What has changed is the breadth of capability. While AlphaStar knew a single domain extremely well, Gemini is being designed to address a variety of tasks spanning language, images, code-writing, and even autonomous browsing.
Vinyals broke down the creation of these sophisticated “digital brains” into two main phases:
In game-playing scenarios, this metric of success is crystal-clear—winning versus losing. In more subjective tasks (e.g., writing “better” poems), the reward signals are fuzzier and limited by how well we can define “better” in the first place.
One of the bigger questions facing AI development today is whether simply making models bigger—adding billions more parameters, or training on exponentially more data—will continue to yield significant improvements. Both Hannah Fry and Oriol Vinyals agreed that gains from pure scaling eventually plateau and that other forms of innovation are crucial.
Vinyals discussed how model architects now carefully consider:
A major shift from older AI to modern systems like Gemini is the push toward agency. That means giving a model not just the capacity to respond to questions but the ability to act in a digital environment on behalf of a user. This concept is sometimes described as giving a language model a “digital body”—the ability to navigate browser tabs, sift through complex data, and perform actions autonomously.
Vinyals highlighted the release of Gemini 2.0, which introduces:
Whether it’s booking travel, writing code, or summarizing the day’s headlines, Gemini 2.0 aims to integrate cutting-edge large language modeling with robust, real-time interactions.
Asked about the quest for artificial general intelligence (AGI), Vinyals was cautiously optimistic. He suggested that, if someone in 2019 had been handed today’s AI, they might have believed AGI was already here. From an outside perspective, these models look astonishingly versatile. But if you dig deeper, limitations—like hallucinations or subtleties around “reward signals”—still exist.
Nonetheless, Vinyals pointed out that superhuman performance is already a reality in well-defined domains like chess, Go, and protein folding (AlphaFold). For more open-ended domains, steps like agentic exploration, deeper reasoning, and access to dynamic external information might soon yield equally astonishing breakthroughs.
The conversation with Oriol Vinyals and Hannah Fry underscored that while scaling up models remains critical, size alone won’t carry AI across the finish line of true autonomy. Instead, breakthroughs are emerging from carefully engineered architectures, strategic use of diverse data, and the push to give AI a “digital body” that can explore its environment much like a human would.
Above all, Gemini 2.0 marks a significant milestone: we no longer live in a world where AI models are confined to single tasks or simply “chat.” With agentic behavior, long-term memory capacities, and the ability to plan complex tasks, the path to more general-purpose, self-directed intelligence is becoming clearer. And as Vinyals reminds us, in just another five years, the frontier might advance in ways we can hardly imagine.