The world's 'first AI software engineer' isn't living up to expectations: Cognition AI's 'Devin' assistant was touted as a game changer for developers, but so far it's fumbling tasks and struggling to compete with human workers
Devin failed to complete most tasks given to it by researchers


Devin, a coding assistant hailed as the world’s 'first AI software engineer’, was given 20 coding tasks – it managed to complete just three, taking longer than expected and going down strange routes to achieve its goals.
The AI coding tool, developed by Cognition AI, was hailed as a transformative solution to help streamline software development when it was unveiled last year.
Costing around $500 per month, the AI assistant works via Slack so it feels like chatting to a colleague. At the time, Cognition showed a demo of Devin picking up jobs on Upwork, a freelancing platform that is used by software engineers to find work.
However, the results haven't been replicable by third-party researchers, according to reports, with one software developer picking apart the Upwork claims and AI researchers assessing Devin found it lacking.
Devin was framed as a game changer AI tool
At Devin's launch last year, Cognition claimed that the tool could "make money taking on messy Upwork tasks," sharing a video purporting to show just that.
But software developer Carl Brown posted his own video in response, arguing that the company was not telling the truth about the tool's abilities, revealing what "Devin was supposed to do, what it actually managed to do instead, and how bad of a job that it did."
Brown noted that it took 36 minutes to do the task himself, and six hours for Devin to fail to do it.
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
Cognition's claims about Devin were also tested by a team of researchers at Answer.AI, and their results were closer to Brown's than what the original blog post claimed, achieving only three of 20 tasks.
There were some "early wins", however. Devin could pull a Notion database into Google Sheets with "surprising competence", they noted, completing the task in an hour with only a few minutes of human interaction.
The code worked, but was "a bit verbose." Another task, building a planet tracker, was similarly successful.
"This felt like a glimpse into the future — an AI that could handle the 'glue code' tasks that consume so much developer time.
More complicated tasks started to raise challenges, or as the researchers said: "as we scaled up our testing, cracks appeared."
"Tasks that seemed straightforward often took days rather than hours, with Devin getting stuck in technical dead-ends or producing overly complex, unusable solutions," they noted. "Even more concerning was Devin’s tendency to press forward with tasks that weren’t actually possible."
Over a month, they tasked Devin with creating new projects from scratch, performing research and analyzing or modifying existing projects, but out of 20 such tasks, just three were successful.
"The most frustrating aspect wasn’t the failures themselves - all tools have limitations - but rather how much time we spent trying to salvage these attempts," they said.
How to use Devin
That's a far cry from what was advertised when the AI assistant was first unveiled in March of last year. A blog post on Cognition's website claimed Devin could take on basic tasks for software engineers, allowing them to focus on bigger problems.
The website says Devin can find and fix bugs, build and deploy an entire app end-to-end, and even train and fine-tune an AI model.
"With our advances in long-term reasoning and planning, Devin can plan and execute complex engineering tasks requiring thousands of decisions," the company said. "Devin can recall relevant context at every step, learn over time, and fix mistakes."
Cognition hasn't yet replied to a request for comment from ITPro, but its own blog post does give some context to how the system could be used more successfully than these tests suggest.
RELATED WHITEPAPER
The company says Devin "can be an all-purpose tool", but recommends starting with smaller tasks such as simple bugs. Notably, the company said that it works best when you "give Devin tasks that you know how to do yourself" and tell the tool how to test or check its own work.
Thereafter, Devin can prove beneficial in helping to break down large tasks into smaller ones that will take less than three hours.
Given Answer.AI's success using Devin for smaller "glue code" tasks, perhaps such advice about starting small should be heeded.
Indeed, this research challenging the usefulness of the current crop of AI software assistants comes as Meta founder Mark Zuckerberg has predicted that AI will be doing the work of mid-level engineers this year — but with some serious caveats.
"In the beginning it’ll be really expensive to run, then you can get it to be more efficient and then over time we’ll get to the point where a lot of the code in our apps and including the AI that we generate is actually going to be built by AI engineers instead of people engineers," he said.
Freelance journalist Nicole Kobie first started writing for ITPro in 2007, with bylines in New Scientist, Wired, PC Pro and many more.
Nicole the author of a book about the history of technology, The Long History of the Future.
-
"I LOVE this company!" Looking back on 50 years of tech giant Microsoft
Opinion There have been highs, lows, laughs and lots of success in the past 5 decades for the Redmond-headquartered firm
By Maggie Holland Published
-
Verizon Call Filter API flaw could’ve exposed millions of Americans’ call records
News A security flaw in Verizon's Call Filter app could’ve allowed threat actors to access details of incoming calls for another user, a security researcher has found.
By Ross Kelly Published
-
AI was a harbinger of doom for low-code solutions, but peaceful coexistence is possible – developers still love the time savings and simplicity despite the allure of popular AI coding tools
News The impact of AI coding tools on the low-code market hasn't been quite as disastrous as predicted
By Ross Kelly Published
-
‘Frontier models are still unable to solve the majority of tasks’: AI might not replace software engineers just yet – OpenAI researchers found leading models and coding tools still lag behind humans on basic tasks
News AI might not replace software engineers just yet as new research from OpenAI reveals ongoing weaknesses in the technology.
By George Fitzmaurice Published
-
‘We’re trading deep understanding for quick fixes’: Junior software developers lack coding skills because of an overreliance on AI tools – and it could spell trouble for the future of development
News Junior software developers may lack coding skills because of an overreliance on AI tools, industry experts suggest.
By George Fitzmaurice Published
-
GitHub's new 'Agent Mode' feature lets AI take the reins for developers
News GitHub has unveiled the launch of 'Agent Mode' - a new agentic AI feature aimed at automating developer activities.
By Ross Kelly Published
-
Westcon-Comstor strikes new Splunk EMEA distribution deal
News Westcon-Comstor has announced a new distribution agreement with Splunk in the EMEA region.
By Daniel Todd Published
-
‘Maybe we aren't going to hire anybody this year’: Marc Benioff says Salesforce might not hire any software engineers in 2025 as the firm reaps the benefits of AI agents
News Salesforce CEO Marc Benioff has suggested the company may freeze hiring for software engineers as the company records productivity boosts through AI agents.
By George Fitzmaurice Published
-
A sign of things to come in software development? Mark Zuckerberg says AI will be doing the work of mid-level engineers this year – and he's not the only big tech exec predicting the end of the profession
News The Meta founder thinks 2025 will herald a profound shift in the software engineering profession
By Solomon Klappholz Published
-
AI helped Google engineers cut code migration times in half
News The firm also simplified communications as migrations can be completed by a single engineer
By George Fitzmaurice Published