Google jumps on the agentic AI bandwagon

Google CEO Sundar Pichai delivers the keynote address at the Google I/O developers conference at Shoreline Amphitheatre on May 10, 2023 in Mountain View, California.

(Image credit: Getty Images)

published 12 December 2024

Google has unveiled the launch of Gemini 2.0, marking its first major foray into agentic AI with its Mariner and Jules agent models.

The move by Google follows similar AI agent roll-outs by a host of industry stakeholders, including Salesforce, Microsoft, and OpenAI with 2025 looking set to be the year AI models move from answering basic questions and generating material to clicking around our desktops for us.

In a blog post, CEO Sundar Pichai said the update to Gemini was focused specifically on agentic models.

"With new advances in multimodality — like native image and audio output — and native tool use, it will enable us to build new AI agents that bring us closer to our vision of a universal assistant," he wrote.

"If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making it much more useful."

Gemini 2.0 for the agentic era

Google shared details about Gemini 2.0 as well as an experimental model called Gemini 2.0 Flash. Gemini is being added to AI Overviews in Google Search and already in the hands of testers. Gemini Flash is now available to developers as an experimental mode via the API.

Demis Hassabis, CEO of Google DeepMind, and Koray Kavukcuoglu, CTO of Google DeepMind, explained in the blog post that Gemini 2.0 Flash is a "workhorse model" with low latency and boosted performance.

Flash 2.0 outperforms Gemini 1.5 Pro on key benchmarks, but at twice the speed, they note.

"In addition to supporting multimodal inputs like images, video and audio, 2.0 Flash now supports multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio," they note. "It can also natively call tools like Google Search, code execution as well as third-party user-defined functions."

Similarly, Hassabis and Kavukcuoglu note in the blog post that Gemini Flash 2.0 will play a key role in powering the tech giant’s new agentic features.

"Gemini 2.0 Flash’s native user interface action-capabilities, along with other improvements like multimodal reasoning, long context understanding, complex instruction following and planning, compositional function-calling, native tool use and improved latency, all work in concert to enable a new class of agentic experiences."

Agents are coming

Indeed, Google unveiled two new agents, Mariner and Jules, as well as an update on Project Astra, the universal assistant Google discussed over the summer, referring to it as an agent that can use multi-modal understanding in the real world.

Thanks to Gemini 2.0, Astra can now make use of Google Search, Lens, and Maps, has 10 minutes of in-session memory for better personalization and context, and works across multiple languages.

So far, the system is limited to "trusted testers" on Android phones, but Google aims to extend the pool of testers — as well as the form factor to glasses.

Project Mariner, meanwhile, is designed to understand information in your browser — such as text, code, images and pixels — in order to navigate pages to complete tasks.

RELATED WHITEPAPER

Securing Expanding Attack Surfaces and Multi-Cloud Deployments

Scalable layered defense against malicious attacks

The blog post stressed that Mariner is an "early research prototype", but claimed the agent achieved a score of 83.5% on the WebVoyager benchmark that tests agent performance on real-world web tasks.

"It’s still early, but Project Mariner shows that it’s becoming technically possible to navigate within a browser, even though it’s not always accurate and slow to complete tasks today, which will improve rapidly over time," the post noted.

The tool keeps a human in the loop for safety, Google said.

"For example, Project Mariner can only type, scroll or click in the active tab on your browser and it asks users for final confirmation before taking certain sensitive actions, like purchasing something."

While Mariner remains in early stages, it's being tested via an experimental Chrome extension, and Google said it was beginning to discuss the tool with the wider web ecosystem, but no release date was announced.

Developers to get a boost

Google also showed off another AI agent dubbed Jules, designed specifically for coding that fits into a GitHub workflow.

"It can tackle an issue, develop a plan and execute it, all under a developer’s direction and supervision," the blog post noted. "This effort is part of our long-term goal of building AI agents that are helpful in all domains, including coding."

Like Mariner, Google also described Jules as an "ongoing experiment", and noted it "may make mistakes."

In a post on the developer blog, the firm said it would be possible to offload Python and JavaScript coding tasks to the agent, letting human developers focus on bigger problems.

"Jules creates comprehensive, multi-step plans to address issues, efficiently modifies multiple files, and even prepares pull requests to land fixes directly back into GitHub," the post said.

So far, Jules is only available to Google's "trusted testers", but it plans to open access for other interested developers in early 2025.

Other AI agents

Google's announcements follow similar news about agents from its rival AI developers.

In September, Salesforce unveiled its customizable AI called Agentforce, while Anthropic in October showed off an early version of an agent with its Claude 3.5 model — though the company admitted its "computer use" functions were experimental and "at times cumbersome and error-prone".

Microsoft also unveiled new agents for Copilot in October, aiming to automate tasks like triaging email and managing expenses.

"Agents are very focused on augmenting business processes and workflows, teams and departments," Charles Lamanna, CVP for Business & Industry Copilot at Microsoft, said at the time.

"They’re able to work on behalf of a system as opposed to a person, work asynchronously, or even autonomously to complete tasks."

In November, reports suggested OpenAI would reveal its own AI agent early next year, with CEO Sam Altman saying at a developer conference that 2025 is the year "agents will work".