OpenAI's new AI model promises to be “more truthful and less toxic”
The organisation has started using human helpers to help teach the new model but warns this could introduce added bias


OpenAI has made a new version of its GPT-3 AI language model available that promises to be better at following user intentions while also producing results that are more truthful and less toxic.
The Open AI API is powered by GPT-3 language models that can be used to perform natural language tasks using carefully engineered text prompts. However, the models can also produce outputs that are untruthful, toxic, or reflect harmful sentiments.
The organisation's AI models have been criticised in the past for a range of shortcomings, including racism against specific genders and religions. The organisation once called GPT-3 too dangerous to make public, due to the API being able to create fake news stories by taking cues from the eight million web pages it had scanned to learn about language.
The organisation said this is partly because GPT-3 is trained to predict the next word on a large dataset of Internet text instead of safely performing the language tasks the user wants.
To make its models safer, and more aligned with users, OpenAI used a technique known as reinforcement learning from human feedback (RLHF), using human helpers called labelers to assist the AI in its learning.
“On prompts submitted by our customers to the API, our labelers provide demonstrations of the desired model behavior, and rank several outputs from our models. We then use this data to fine-tune GPT-3,” said the company.
It found the resulting models are much better at following instructions than the GPT-3. They also make up facts less often and show small decreases in toxicity. The organisation’s labelers prefer outputs from its new 1.3B InstructGPT model over outputs from its 175B GPT-3 model, despite having over 100x fewer parameters.
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
These InstructGPT models have been in beta on the API for over a year and are now the default language models accessible on OpenAI’s API.
“We believe that fine-tuning language models with humans in the loop is a powerful tool for improving their safety and reliability, and we will continue to push in this direction,” the organisation explained.
However, OpenAI outlined that there are some limitations to this model too. The InstructGPT models, for example, are far from fully aligned or fully safe, meaning they still generate toxic outputs, make up facts, or generate sexual and violent content without explicit prompting.
RELATED RESOURCE
Content syndication isn't dead, but your data processes might be
It's a new (lead) generation
It said that to support the safety of its API, it will continue to review potential applications before they go live, provide content filters for detecting unsafe completions, and monitor for misuse.
OpenAI also highlighted that in many cases, aligning to the average labeler preference may not be desirable. The example it gave is that when generating text that disproportionately affects a minority group, the preferences of that group should be weighted more heavily.
“Right now, InstructGPT is trained to follow instructions in English; thus, it is biased towards the cultural values of English-speaking people,” it said. “We are conducting research into understanding the differences and disagreements between labelers’ preferences so we can condition our models on the values of more specific populations.”
Zach Marzouk is a former ITPro, CloudPro, and ChannelPro staff writer, covering topics like security, privacy, worker rights, and startups, primarily in the Asia Pacific and the US regions. Zach joined ITPro in 2017 where he was introduced to the world of B2B technology as a junior staff writer, before he returned to Argentina in 2018, working in communications and as a copywriter. In 2021, he made his way back to ITPro as a staff writer during the pandemic, before joining the world of freelance in 2022.
-
Bigger salaries, more burnout: Is the CISO role in crisis?
In-depth CISOs are more stressed than ever before – but why is this and what can be done?
By Kate O'Flaherty Published
-
Cheap cyber crime kits can be bought on the dark web for less than $25
News Research from NordVPN shows phishing kits are now widely available on the dark web and via messaging apps like Telegram, and are often selling for less than $25.
By Emma Woollacott Published
-
Nearly half of workers think using AI makes them look lazy and incompetent
News AI adoption is slowing among desk workers, driven by uncertainty around its permissibility in the workplace
By Solomon Klappholz Published
-
UK government trials chatbots in bid to bolster small business support
News The UK government is running a private beta of a new chatbot designed to help people set up small businesses and find support.
By Emma Woollacott Published
-
What you need to leverage genAI
Whitepaper What you need to leverage genAI
By ITPro Published
-
AI Survey Report
Whitepaper Level up your AI game with secure GenAI adoption
By ITPro Published
-
AI Code security report: Organizations must change their approach
Whitepaper 56.4% say insecure AI suggestions are common — but few have changed processes to improve AI security
By ITPro Published
-
Gen AI buyer’s guide
Whitepaper Protecting businesses from AI-generated code vulnerabilities
By ITPro Published
-
Achieving business outcomes with generative AI
Webinar Take your hybrid cloud journey to the next level with generative AI
By ITPro Published
-
The CEO's guide to generative AI: This is marketing's sink or swim moment
Whitepaper Position marketing as the model for generative AI-driven workforce transformation
By ITPro Published