Google Gemini shows tech giant is still in the generative AI race as model outperforms GPT-4
Google Gemini Ultra outperforms humans on benchmarks like MMLU


Rory Bathgate
The launch of Google Gemini, the tech giant’s latest AI model, represents the first true competitor to OpenAI’s GPT-4 model and could herald a future battle between the two firms, analysts have said.
Google unveiled Gemini on 6 December, hailing the powerful new AI model’s “sophisticated multimodal reasoning capabilities” as a potential game changing moment in the generative AI race.
Daryl Plummer, distinguished VP analyst & Gartner Fellow, told ITPro the introduction of Google Gemini could shift attention away from OpenAI and has set a “high bar” for competitors in the space.
“2023 has seen Google go from being ‘counted out’ after the introduction of ChatGPT to leapfrogging innovations on models with the introduction of Gemini,” he said.
“Large language models and foundation models are at the center of GenAI excitement, and customers keep asking which models will be the most beneficial to them.
“While Google has many models, much of the industry attention has been on GPT variants. Google needed to set the bar high for how these models will evolve.”
Google Gemini: Everything you need to know
Google confirmed the Gemini model will be integrated with Bard, the firm’s flagship chatbot.
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
This will provide users with advanced capabilities, including heightened reasoning and natural language abilities. Gemini will be available in three sizes – Ultra, Pro, and Nano.
Users will also be able to run the model across a range of areas; Bard will be powered by Gemini Pro, while mobile devices users will gain new features through Gemini’s Nano range.
Gemini Ultra, the most powerful of the three available classes, will be rolled out next year, Google confirmed.
Critically, Gemini Ultra outperformed OpenAI’s GPT-4 model across the majority of benchmarks.
In an announcement, Demis Hassabis, CEO and co-founder of Google DeepMind, described Gemini as the result of “rigorous testing”, adding that the model will supercharge performance on a “wide variety of tasks” for users.
“From natural image, audio, and video understanding, to mathematical reasoning, Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of 32 widely-used academic benchmarks used in large language model research and development,” he wrote.
Hassabis added that, with a score of 90%, Gemini Ultra is the “first language model to outperform human experts” on massive multitask language understanding (MMLU).
All told, this means Gemini is capable of considering difficult questions more carefully before providing an answer and delivers significant improvements for users compared to other industry models.
Plummer said Gemini could herald a step change in the use of large language models due to its performance, enabling more intuitive, nuanced capabilities to assist developers in coding activities, for example.
“Sophisticated reasoning in Gemini allows the model to help pull relevant and correlated information from multiple complex documents and data,” he said.
“And Gemini is trained to support more varied and nuanced coding to continue the assistive support to developers. It can solve nearly twice as many types of coding problems as previous versions.”
Chirag Dekate, VP analyst at Gartner, echoed Plummer’s comments, adding that the model “sets the new benchmark in a fast-evolving, game-changing generative AI landscape”.
Google Gemini may force OpenAI’s hand
Plummer said the launch of Gemini places OpenAI and Microsoft in a precarious position and may force the duo to respond in a rapid manner. Furthermore, Gemini showcases the ability of Google to respond to ongoing developments in the generative AI space.
While much focus has been placed on OpenAI over the last year, Google has been quietly innovative out of the limelight; questions now remain on whether users will see tangible business value in the model, however.
“Gemini now must be responded to by OpenAI and Microsoft,” he said. “Google is seriously in this game even though they don’t have as many people claiming they lead it yet. Gemini is leapfrogging where others have been and the question will be – do customers see the value?”
“The question of whether there are diminishing returns on large model sizes has not yet been answered. Google has not released data on the quantity of parameters on which Gemini is trained. However, the use of general language models will continue to rise as their facility and usefulness grows.
“Gemini represents a new bar in that effort. Google must make a stronger connection to enterprise problems but, for now, no one should be taking Google’s AI efforts for granted anymore.”
How does Gemini compare to other models?
Google has claimed that Gemini Ultra is among the most sophisticated AI models ever built. To back this up, it has released benchmark results that show it edging out GPT-4 across a range of different academic benchmarks for AI.
Gemini Ultra scored 90% in the Massive Multitask Language Understanding (MMLU) benchmark, a text reasoning test in which AI models must complete 14,000 multiple choice questions that cover information outside the scope of training data. This puts it ahead of GPT-4’s score of 86.4% and the 78.3% achieved by Google’s other high-profile LLM PaLM 2.

Rory Bathgate is Features and Multimedia Editor at ITPro, overseeing all in-depth content and case studies. He can also be found co-hosting the ITPro Podcast with Jane McCallion, swapping a keyboard for a microphone to discuss the latest learnings with thought leaders from across the tech sector.
The creators of MMLU estimated that human experts would score 89.8%, using a combination of guesswork and statistical results from high percentile exam results. Google has therefore claimed that it has achieved a world-first: an AI model that can outperform human experts in certain conditions.
Google has claimed that Gemini Ultra outperforms GPT-4 in 30 of 32 widely-used benchmarks for AI models. The results show that in most cases, the competition is close – for example with MATH, a benchmark that tests models on difficult math problems such as geometry, Gemini Ultra’s achieved 53.2% to GPT-4’s 52.9%.
But Google has also stressed how far its model can exceed competitors when it comes to multimodal benchmarking such as Infographic VQA, which pits models against infographics and data visualizations to test their abilities to unpick information from images and derive reason from graphical layouts.
Gemini Ultra’s unique strength lies in its ability to process images at the same time as text and other inputs such as audio, without the need to run them through object character recognition (OCR) models or run natural language processing (NLP) on transcripts produced through a separate speech-to-text model. This allows it to work more accurately and efficiently.
RELATED RESOURCE
Get an informed overview of what to consider when executing GenA
DOWNLOAD NOW
When it releases in 2024, Gemini Ultra’s results will be put to the test by independent testers and enterprises will be able to make a more informed decision on the application of the model within their environment.
Until then, Gemini Pro is already powering Google Bard. This lighter iteration of Gemini is less powerful but in testing still largely outperformed GPT-3.5, which powers the free tier of ChatGPT, as well as Meta’s Llama 2. Gemini Pro scored 79.1% on MMLU, ahead of GPT-3.5’s 70% as well as the 78.5% achieved by Anthropic’s Claude 2.
As it stands, Google has fired a powerful shot at OpenAI with this new flagship model, setting a new bar for AI performance. With meaningful competition to GPT-4 having been realized, competition in the space will only intensify.

Ross Kelly is ITPro's News & Analysis Editor, responsible for leading the brand's news output and in-depth reporting on the latest stories from across the business technology landscape. Ross was previously a Staff Writer, during which time he developed a keen interest in cyber security, business leadership, and emerging technologies.
He graduated from Edinburgh Napier University in 2016 with a BA (Hons) in Journalism, and joined ITPro in 2022 after four years working in technology conference research.
For news pitches, you can contact Ross at ross.kelly@futurenet.com, or on Twitter and LinkedIn.
- Rory BathgateFeatures and Multimedia Editor
-
Bigger salaries, more burnout: Is the CISO role in crisis?
In-depth CISOs are more stressed than ever before – but why is this and what can be done?
By Kate O'Flaherty Published
-
Cheap cyber crime kits can be bought on the dark web for less than $25
News Research from NordVPN shows phishing kits are now widely available on the dark web and via messaging apps like Telegram, and are often selling for less than $25.
By Emma Woollacott Published
-
OpenAI woos UK government amid consultation on AI training and copyright
News OpenAI is fighting back against the UK government's proposals on how to handle AI training and copyright.
By Emma Woollacott Published
-
DeepSeek and Anthropic have a long way to go to catch ChatGPT: OpenAI's flagship chatbot is still far and away the most popular AI tool in offices globally
News ChatGPT remains the most popular AI tool among office workers globally, research shows, despite a rising number of competitor options available to users.
By Ross Kelly Published
-
Microsoft launches new security AI agents to help overworked cyber professionals
News Microsoft is expanding its Security Copilot service with new AI agents to help overworked IT teams deal with surging security threats.
By Bobby Hellard Published
-
‘DIY’ agent platforms are big tech’s latest gambit to drive AI adoption
Analysis The rise of 'DIY' agentic AI development platforms could enable big tech providers to drive AI adoption rates.
By George Fitzmaurice Published
-
Google DeepMind’s Demis Hassabis says AI isn’t a ‘silver bullet’ – but within five to ten years its benefits will be undeniable
News Demis Hassabis, CEO at Google DeepMind and one of the UK’s most prominent voices on AI, says AI will bring exciting developments in the coming year.
By Rory Bathgate Published
-
‘The entire forecasting business process changed’: Microsoft CEO Satya Nadella says Excel changed the game for enterprises in 1985 – he’s confident AI tools will do the same
News The Microsoft CEO says we need to change how we measure the value of AI
By George Fitzmaurice Published
-
OpenAI wants to simplify how developers build AI agents
News OpenAI is releasing a set of tools and APIs designed to simplify agentic AI development in enterprises, the firm has revealed.
By George Fitzmaurice Published
-
Microsoft exec touts benefits of AI productivity gains
News Microsoft CCO Judson Althoff said the company is unlocking significant efficiency gains from AI tools internally.
By George Fitzmaurice Published