Generative AI training in the crosshairs as ICO set to examine legality of personal data use
Generative AI training methods have become a contentious issue in recent months amid data privacy concerns and a slew of lawsuits against major industry players


The legality of generative AI training methods is set to be examined by the Information Commissioner’s Office (ICO) amid concerns over the use of personal data.
AI training methods have been a key talking point in recent months due to the manner in which large language models (LLMs) are built. LLMs such as ChatGPT are typically built using vast amounts of data collected through web scraping.
However, these practices have raised concerns both about data privacy and the legal repercussions for developers that fall foul of copyright laws.
The ICO said conversations with developers in the AI space have highlighted several areas where organizations seek greater clarity around how data protection laws apply to the development and use of generative AI.
This includes questions over the appropriate lawful basis for training generative AI models, and how the purpose limitation principle plays out in the context of generative AI development and deployment.
There are also lingering questions about complying with the accuracy principle, as well as the expectations in terms of complying with data subject rights.
Over the coming months, the ICO said it plans to release guidance on its position on the matter, outlining how specific requirements of UK GDPR and the Data Protection Act (2018) could impact generative AI training methods.
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
"The impact of generative AI can be transformative for society if it’s developed and deployed responsibly," said Stephen Almond, the ICO's executive director for regulatory risk.
"This call for views will help the ICO provide industry with certainty regarding its obligations and safeguard people’s information rights and freedoms."
Generative AI training and ‘legitimate interest’
Under the UK GDPR, the purpose of data processing must be legitimate and necessary for that purpose, and the individual’s interests must not override the interest being pursued.
The ICO said its current thinking is that legitimate interests can be a valid lawful basis for training generative AI models on web scraped data, as long as the model developer can ensure they pass this three-part test.
The developer’s interest could be simply the business interest in developing a model and deploying it for commercial gain, or wider societal interests - as long as the developer can evidence the model’s specific purpose and use.
As for necessity, the ICO recognizes that, currently, most generative AI training is only possible using data obtained through large-scale scraping.
With the 'balancing' test, the data watchdog noted that things can be complicated depending on whether generative AI models are deployed by the initial developer, by a third-party through an API, or simply provided to third parties.
RELATED RESOURCE
Stop fighting fires and start rethinking your supply chain
DOWNLOAD NOW
The ICO said it will engage with stakeholders from across the technology industry as part of the investigation, including developers and users of generative AI, legal advisors and consultants working in the space, civil society groups, and public bodies with an interest in generative AI.
The first consultation is open until 1 March, with future consultations planned during the first half of this year to examine issues such as the accuracy of generative AI outputs.
Emma Woollacott is a freelance journalist writing for publications including the BBC, Private Eye, Forbes, Raconteur and specialist technology titles.
-
Bigger salaries, more burnout: Is the CISO role in crisis?
In-depth CISOs are more stressed than ever before – but why is this and what can be done?
By Kate O'Flaherty Published
-
Cheap cyber crime kits can be bought on the dark web for less than $25
News Research from NordVPN shows phishing kits are now widely available on the dark web and via messaging apps like Telegram, and are often selling for less than $25.
By Emma Woollacott Published
-
NHS in more data security blunders
News Two NHS trusts lose a significant amount of patient data... again.
By Tom Brewster Published
-
Week in Review: Tech City and Google guilt
News This week, David Cameron announced plans to turn the East End of London into a Silicon Valley rival and Google was told off by the ICO.
By Jennifer Scott Published
-
ICO reports record levels of business
News The ICO is busier than it ever has been after seeing a spike in data protection cases over the 2009/10 period.
By Tom Brewster Published
-
ICO urges ‘Privacy by Design’
News A report and conference aim to explore the barriers to widespread uptake of privacy enhancing technologies, and their design into plans and projects.
By Miya Knights Published
-
ICO: Databases never ‘risk free’
News The Information Commissioner’s Office has warned on government plans for a communications database, but welcomed public debate.
By Nicole Kobie Published
-
Police must delete old data
News The ICO’s ruling that there is no justification for holding old criminal record data has been upheld by the Information Tribunal.
By Miya Knights Published
-
Data watchdog issues stark warnings
News The Information Commissioner condemns government database expansion plans, with particular venom reserved for the MoD and HMRC after data breaches.
By Miya Knights Published