Anthropic’s Claude 3.5 Sonnet AI model puts the firm on a collision course with OpenAI and Google

(Image credit: Getty Images)

published 25 June 2024

Anthropic has released its latest large language model (LLM), Claude 3.5 sonnet, which can outperform rivals including GPT-4o and Gemini 1.5 Pro.

Claude 3.5 sonnet is the new medium-sized LLM from Anthropic, which has historically released its models in small, medium, and large versions differentiated by subtitle. Its smallest LLM is called ‘Haiku’, while the biggest is dubbed ‘Opus’.

This is the first of the Claude 3.5 models to be released, and Anthropic has published benchmarks which show it as outperforming competitor models such as GPT-4o and Gemini 1.5 Pro, as well as its own recently-released Claude 3 Opus model.

Claude 3.5 Sonnet is available for free on Claude.ai and the Claude iOS app, while Claude Pro and Team plan subscribers can access it with significantly higher rate limits. It is also available via the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI.

Anthropic said the model sets a new bar for industry benchmarks such as GPQA, which evaluates a model’s ability to answer multiple-choice questions at a graduate level, having scored 59.4% compared to GPT-4o’s 53.6%.

It also scored highly on MMLU which covers undergraduate-level knowledge (88.7%), and HumanEval which measures coding proficiency (92%), all of which beat out competitors such as the Llama 3 400B snapshot and Gemini Pro 1.5.

“It shows marked improvement in grasping nuance, humor, and complex instructions, and is exceptional at writing high-quality content with a natural, relatable tone,” the company said.

Claude 3.5 Sonnet follows hot on the heels of recent model launches

Reflecting the frantic pace in the much-hyped AI space, it’s actually only three months since Anthropic unveiled its then state-of-the-art models Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus.

New LLMs are currently being debuted every few months and regularly leapfrog the top-performing models in terms of benchmarks. That said, the current leading models all sit within a few points of each other on many benchmarks.

The rapidly shifting economics of AI models is striking though, as Claude 3.5 Sonnet is in addition to being faster than Claude 3 Opus, the new model is also much cheaper.

Claude 3.5 Sonnet costs $3 per million input tokens and $15 per million output tokens, with a 200K token context window, Anthropic said. One token is four characters or somewhere around three-quarters of a word so 100 tokens is 75 words.

RELATED WEBINAR

Shielding the Future: Europe's Cyber Threat Landscape Report

Discover how companies are coping with rising volumes of cybersecurity incidents

In contrast, when launched, Claude 3 Opus cost $15 per million input tokens and $75 per million output tokens (Claude 3 Sonnet cost $3 and $15, while Haiku was priced at $.025 and $1.25).

Anthropic said Claude 3.5 Sonnet operates at twice the speed of Claude 3 Opus and is better at solving code problems.

“This performance boost, combined with cost-effective pricing, makes Claude 3.5 Sonnet ideal for complex tasks such as context-sensitive customer support and orchestrating multi-step workflows,” the company said.

“When instructed and provided with the relevant tools, Claude 3.5 Sonnet can independently write, edit, and execute code with sophisticated reasoning and troubleshooting capabilities. It handles code translations with ease, making it particularly effective for updating legacy applications and migrating codebases,” it said.

Anthropic also said Claude 3.5 Sonnet beats Claude 3 Opus on standard vision benchmarks and said these improvements are most noticeable for tasks that require visual reasoning, like interpreting charts and graphs.

It can transcribe text from imperfect images “a core capability for retail, logistics, and financial services, where AI may glean more insights from an image, graphic or illustration than from text alone,” the company said.

When a user asks Claude to generate content like code snippets, text documents, or website designs, these elements which Anthropic is calling ‘Artifacts’ appear in a window alongside their conversation. The idea is to create a workspace where users can see, edit, and build upon AI creations and build them into projects and workflows.

“This preview feature marks Claude’s evolution from a conversational AI to a collaborative work environment. It’s just the beginning of a broader vision for Claude.ai, which will soon expand to support team collaboration,” Anthropic said.

The company added that soon teams will be able to store their “knowledge, documents, and ongoing work in one shared space, with Claude serving as an on-demand teammate”, potentially a move toward Anthropic going after the collaboration tools market dominated by the likes of Microsoft Teams or Slack.

Anthropic said it has provided Claude 3.5 Sonnet to the UK’s Artificial Intelligence Safety Institute (UK AISI) for pre-deployment safety evaluation. The UK AISI completed tests of 3.5 Sonnet and shared their results with the US AI Safety Institute, it said.

“We have integrated policy feedback from outside subject matter experts to ensure that our evaluations are robust and take into account new trends in abuse. This engagement has helped our teams scale up our ability to evaluate 3.5 Sonnet against various types of misuse,” the company said.

Anthropic eyes strong enterprise use cases with Claude Sonnet 3.5

While there has been a huge amount of excitement about the potential of generative AI to remake work and business, many companies are still very much at the test and trial stage of using it and some are questioning the ROI of AI.

Focus specifically on enterprise use cases will be necessary for Anthropic to take on its biggest rivals and to reap profits from its AI workloads.

Public AI workloads are expensive to run and others in the space such as OpenAI and Google Cloud have already sought to monetize enterprise AI as much as possible to cover running costs.

Anthropic has leaned on heavy investment by AWS and Google Cloud to improve its product offering and is one of a few companies primed to take on OpenAI in the immediate future.

The company said it will release Claude 3.5 Haiku and Claude 3.5 Opus later this year. It noted it's also developing new features to support more use cases for businesses, including integrations with enterprise applications.

“Our team is also exploring features like Memory, which will enable Claude to remember a user’s preferences and interaction history as specified, making their experience even more personalized and efficient,” it said.

Steve Ranger is an award-winning reporter and editor who writes about technology and business. Previously he was the editorial director at ZDNET and the editor of silicon.com.