Google shows off new smaller generative AI tools and an AI agent on your phone
Google wants to seize momentum from rivals like OpenAI in the generative AI race - and its latest LLM updates could turn the tide


Google has showcased a number of updates to its generative AI tools and a few peeks at future products, as it continues its efforts to seize momentum from OpenAI.
Earlier this week OpenAI unveiled its latest flagship LLM , GPT-4o, which is able to joke and flirt with users; following that it has been Google’s chance to detail where it has got to with its generative AI products at its Google I/O event.
Late last year Google unveiled its first multimodal LLM Gemini 1.0 in three sizes: Ultra, Pro and Nano for on-device processing. It follows this with Gemini 1.5 with improved performance and a context window of one million tokens (one token is four characters or somewhere around three-quarters of a word so 100 tokens is 75 words).
Because, the company said, developers want an LLM with lower latency and lower cost; it has now added Gemini 1.5 Flash to the portfolio.
Gemini 1.5 Flash is the fastest Gemini model served in the API and it is optimized for high-volume, high-frequency tasks which Google said makes it more cost-efficient to serve.
Although it’s a lighter weight model than 1.5 Pro, it’s still capable of multimodal reasoning across vast amounts of information, according to Google DeepMind CEO Demis Hassabis.
He said 1.5 Flash is suited to summarization, chat applications, image and video captioning, or data extraction from long documents and tables. It has been trained by 1.5 Pro through a process called “distillation,” whereby the most essential knowledge and skills from a larger model are transferred to a smaller model.
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
The model has a one-million-token context window by default, which means you can process one hour of video, 11 hours of audio, codebases with more than 30,000 lines of code, or over 700,000 words.
Both 1.5 Pro and 1.5 Flash are available in public preview with a 1 million token context window in Google AI Studio and Vertex AI.
Google Gemini 1.5 Pro updates
Google also introduced updates to Gemini 1.5 Pro which it styles as its best model for general performance across generative AI tasks. These include upping the model to a two million token context window.
The company said this would give the model “near-perfect recall on long-context retrieval tasks” making it possible to accurately process large-scale documents, thousands of lines of code or hours of audio and video.
To illustrate this Google had the model analyst a 402-page transcript of the Apollo 11 Moon landing – accounting for 320,00 tokens, and then hunted through for ‘comedic’ moments, which it did.
Hassabis said Google had also enhanced its code generation, logical reasoning and planning, multi-turn conversation, and audio and image understanding. “We see strong improvements on public and internal benchmarks for each of these tasks,” he said.
This means that Gemini 1.5 Pro can now follow increasingly complex and nuanced instructions, he said. “We’ve improved control over the model’s responses for specific use cases, like crafting the persona and response style of a chat agent or automating workflows through multiple function calls.”
He said 1.5 Pro can now reason across image and audio for videos uploaded in Google AI Studio, and that 1.5 Pro is being integrated into Google products, including Gemini Advanced and in Workspace apps.
Gemini on Android
Google said Gemini on Android will use generative AI to get better at understanding the context of what’s on your screen and what app you’re using.
Android users will soon be able to bring up Gemini's overlay on top of the app they are using. Google gave the example of dragging and dropping generated images into Gmail or Google Messages, or tapping “Ask this video” to find specific information in a YouTube video.
With Gemini Advanced users will have the option to “Ask this PDF” to quickly get answers from documents. Google said this update will roll out to “hundreds of millions of devices” over the next few months.
It said that, starting with Pixel later this year, it will be introducing Gemini Nano with Multimodality to allow phones to not just process text input but also understand more information in context like sights, sounds and spoken language.
Google said it is also testing a new feature that uses Gemini Nano to provide real-time alerts during a call if it detects conversation patterns associated with scams. Users would receive an alert if someone posing as a “bank representative” asks you to urgently transfer funds, make a payment with a gift card or requests personal information like card PINs or passwords, which are not the sort of thing the bank usually asks you to do.
“This protection all happens on-device, so your conversation stays private to you,” Google said, which this would be offered as an opt-in feature.
Project Astra
Google also showed off Project Astra, which Hassabis described as an ‘advanced seeing and talking responsive agent’.
Google illustrated this with a pair of videos featuring someone walking around Google’s London office and using the agent on a smartphone to identify objects and read software code. They then switched to smart glasses and the same agent was able to help fix a coding problem and come up with a name for the band (the band apparently featured a soft toy and a dog, which didn’t seem to bother the AI).
Hassabis said that in order to be truly useful, an agent needs to understand and respond to the complex and dynamic world just like people do, and to be able to take in and remember what it sees and hears in context so it can take action.
But said that getting response time down to something conversational is a difficult engineering challenge.
RELATED WHITEPAPER
“Over the past few years, we've been working to improve how our models perceive, reason and converse to make the pace and quality of interaction feel more natural.”
He said that, by building on Gemini, Google has developed prototype agents that can process information faster by continuously encoding video frames, combining the video and speech input into a timeline of events, and caching this information for efficient recall. These agents can better understand the context and respond quickly in conversation.
“With technology like this, it’s easy to envision a future where people could have an expert AI assistant by their side, through a phone or glasses,” he said.
None of these updates particularly grab the attention like OpenAI’s latest chatty release, apart from Project Astra which is still in development. However, showing off a multimodal assistant running on a smartphone will certainly put the pressure on Apple to come up with something similar, soon.
Steve Ranger is an award-winning reporter and editor who writes about technology and business. Previously he was the editorial director at ZDNET and the editor of silicon.com.
-
Bigger salaries, more burnout: Is the CISO role in crisis?
In-depth CISOs are more stressed than ever before – but why is this and what can be done?
By Kate O'Flaherty Published
-
Cheap cyber crime kits can be bought on the dark web for less than $25
News Research from NordVPN shows phishing kits are now widely available on the dark web and via messaging apps like Telegram, and are often selling for less than $25.
By Emma Woollacott Published
-
OpenAI woos UK government amid consultation on AI training and copyright
News OpenAI is fighting back against the UK government's proposals on how to handle AI training and copyright.
By Emma Woollacott Published
-
DeepSeek and Anthropic have a long way to go to catch ChatGPT: OpenAI's flagship chatbot is still far and away the most popular AI tool in offices globally
News ChatGPT remains the most popular AI tool among office workers globally, research shows, despite a rising number of competitor options available to users.
By Ross Kelly Published
-
‘DIY’ agent platforms are big tech’s latest gambit to drive AI adoption
Analysis The rise of 'DIY' agentic AI development platforms could enable big tech providers to drive AI adoption rates.
By George Fitzmaurice Published
-
Google DeepMind’s Demis Hassabis says AI isn’t a ‘silver bullet’ – but within five to ten years its benefits will be undeniable
News Demis Hassabis, CEO at Google DeepMind and one of the UK’s most prominent voices on AI, says AI will bring exciting developments in the coming year.
By Rory Bathgate Published
-
OpenAI wants to simplify how developers build AI agents
News OpenAI is releasing a set of tools and APIs designed to simplify agentic AI development in enterprises, the firm has revealed.
By George Fitzmaurice Published
-
Google CEO Sundar Pichai says DeepSeek has done ‘good work’ showcasing AI model efficiency — and Gemini is going the same way too
News Google CEO Sundar Pichai hailed the DeepSeek model release as a step in the right direction for AI efficiency and accessibility.
By Nicole Kobie Published
-
Elon Musk’s $97 billion flustered OpenAI – now it’s introducing rules to ward off future interest
News OpenAI is considering restructuring the board of its non-profit arm to ward off unwanted bids after Elon Musk offered $97.4bn for the company.
By Nicole Kobie Published
-
Sam Altman says ‘no thank you’ to Musk's $97bn bid for OpenAI
News OpenAI has rejected a $97.4 billion buyout bid by a consortium led by Elon Musk.
By Nicole Kobie Published