Red teaming comes to the fore as devs tackle AI application flaws
Red teaming can play a crucial role in identifying flaws and cutting risky behaviors


Only a third of organizations employ adequate testing practices in AI application development, according to new research, prompting calls for increased red teaming to reduce risks.
Analysis from Applause found 70% of developers are currently developing AI applications and features, with over half (55%) highlighting chatbots and customer support tools as their primary focus at present.
Yet despite an acceleration in AI application development, a concerning number of organizations are overlooking quality assurance (QA) efforts during the software development lifecycle.
The study warned this trend is having an adverse impact on both quality and long-term return on investment (ROI).
“The results of our annual AI survey underscore the need to raise the bar on how we test and roll out new generative AI models and applications,” said Chris Sheehan, EVP of high tech & AI at Applause.
AI application development needs a human touch
A key talking point of the Applause study centered around human involvement in the development lifecycle. With developers ramping up the use of generative AI tools in their daily workflows, the need for a ‘human touch’ has become critical to identify and remediate a range of issues.
These include issues such as inaccuracy, bias, and toxicity, the study noted.
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
Researchers found the top QA-related activities that involve human testing include prompt and response grading (61%), accessibility testing (54%), and UX testing (57%).
Applause added that humans are also crucial in training industry-specific or ‘niche’ models, particularly with the rise of agentic AI applications that interact directly with end-users.
Notably, the study found that only one-third (33%) of organizations currently employ red team testing in application development processes. Red teaming refers to adversarial testing practices - commonly used in cybersecurity - to identify potential weak points in platforms or applications.
Researchers called for a heightened focus on red teaming in AI application development, noting that this could play a key role in highlighting the aforementioned issues such as model bias or inaccuracy.
Application flaws persist
The study from Applause found that customer-related issues are becoming a frequent problem for enterprises. Nearly two-thirds of customers using generative Ai in 2025 reported encountering some sort of issue.
Over a third (35%) encountered biased responses, hallucinations (32%), and offensive responses (17%).
Hallucinations have been a persistent problem in AI development for some time now.
While the situation has improved markedly since the early days of the generative AI boom, the issue is still causing a degree of uncertainty among enterprise IT leaders.
In a study by KPMG in August 2024, six-in-ten tech leaders specifically highlighted hallucinations as a key concern with adopting or building generative AI tools and applications.
Sheehan noted that positive changes are being made by development teams, however. Many enterprises surveyed by the firm are “already ahead of the curve” and are integrating AI testing measures into the development lifecycle at an earlier stage.
This includes more robust model training methods which employ “diverse, high quality” datasets. Some enterprises are also warming to red teaming practices, he added.
“While every generative AI use case requires a custom approach to quality, human intelligence can be applied to many parts of the development process including model data, model evaluation and comprehensive testing in the real world.
“As AI seeps into every part of our existence, we need to ensure these solutions provide the exceptional experiences users demand while mitigating the risks that are inherent to the technology.”
MORE FROM ITPRO
- Developers spend 17 hours a week on security — but don't consider it a top priority
- Java developers are facing serious productivity issues
- Want developers to build secure software? You need to ditch these two programming languages

Ross Kelly is ITPro's News & Analysis Editor, responsible for leading the brand's news output and in-depth reporting on the latest stories from across the business technology landscape. Ross was previously a Staff Writer, during which time he developed a keen interest in cyber security, business leadership, and emerging technologies.
He graduated from Edinburgh Napier University in 2016 with a BA (Hons) in Journalism, and joined ITPro in 2022 after four years working in technology conference research.
For news pitches, you can contact Ross at ross.kelly@futurenet.com, or on Twitter and LinkedIn.
-
Why are many men in tech blind to the gender divide?
In-depth From bias to better recognition, male allies in tech must challenge the status quo to advance gender equality
By Keri Allan
-
BenQ PD3226G monitor review
Reviews This 32-inch monitor aims to provide the best of all possible worlds – 4K resolution, 144Hz refresh rate and pro-class color accuracy – and it mostly succeeds
By Sasha Muller
-
‘Frontier models are still unable to solve the majority of tasks’: AI might not replace software engineers just yet – OpenAI researchers found leading models and coding tools still lag behind humans on basic tasks
News AI might not replace software engineers just yet as new research from OpenAI reveals ongoing weaknesses in the technology.
By George Fitzmaurice
-
Java developers are facing serious productivity issues: Staff turnover, lengthy redeploy times, and a lack of resources are hampering efficiency – but firms are banking on AI tools to plug the gaps
News Java developers are encountering significant productivity barriers, according to new research, prompting businesses to take drastic measures to boost efficiency.
By Solomon Klappholz
-
Software security debt is spiraling out of control – remediation times have surged 47% in the last five years, and it’s pushing teams to breaking point
News Software security flaws are taking longer to fix than ever, with remediation times having grown by 47% in the last five years.
By Nicole Kobie
-
Why the CrowdStrike outage was a wakeup call for developer teams
News The CrowdStrike outage in 2024 has prompted wholesale changes to software testing and development lifecycle practices, according to new research.
By Solomon Klappholz
-
The ultimate guide to getting your killer app off the ground
Industry Insight When building software, the process of designing, testing, prototyping, and perfecting your project is never ending
By Jon Spinage
-
The best Python test frameworks
Best Make your Python code shine with these testing tools
By Danny Bradbury
-
IT Pro Panel: The road to Windows 11
IT Pro Panel As the new OS gears up for rollout, we talk to our panellists about their upgrade plans
By Adam Shepherd
-
Huawei to launch HarmonyOS for smartphones next week
News The Chinese tech giant will switch to its homegrown OS as it looks to fully abandon Android by October
By Sabina Weston