AI and data protection: What businesses need to know

An abstract cube filled with multi-colored layers exposed in missing chunks, surrounded by other cubes, representing AI data protection. Decorative: the cube in the center is yellow and the other cubes are blue, while all sit on a polished white surface.

(Image credit: Getty Images)

published 22 August 2024

Businesses are increasingly taking advantage of generative AI to boost efficiency – but the technology also poses significant data protection risks. AI relies on masses of data to operate and using it at scale can lead to unintentional sharing of private business information.

One in five UK companies has had potentially sensitive corporate data exposed via employee use of generative AI, a new report [PDF] by RiverSafe has revealed. Take the example of tech giant Samsung, which was forced to ban the use of generative AI after staff shared sensitive data including source code and meeting notes with ChatGPT as reported by Techradar Pro.

The risks span multiple vectors – exposed information is in breach of data protection policies such as the General Data Protection Regulation (GDPR) or Data Protection Act (DPA) 2018.

Last year, Forrester senior analyst Alla Valente described how generative AI use will lead to major data breaches and fines for application developers using the technology.

So, what exactly are the data protection risks posed by generative tools such as ChatGPT and how can companies put a strategy in place to mitigate these?

AI and data protection risks

One of the key concerns centers around the handling of sensitive information by AI systems, says Chris Harris, EMEA technical associate vice president for data security at Thales. “Given the vast amounts of data these systems are ingesting and processing, there’s a real risk that sensitive information is also being captured and could be inadvertently revealed.”

Adding to the complexity, the integrity of AI systems heavily depends on the quality and reliability of the data they’re trained on. If malicious information is introduced into training datasets – what’s known as “data poisoning” – the learning process can be corrupted and compromise the model’s performance, Harris says. “Outputs could also be biased or factually incorrect, with the potential to manipulate outcomes and undermine decision-making processes if sufficient scrutiny isn’t in place.”

Meanwhile, cybercriminals could trick generative AI systems into extracting and sharing confidential data, such as customer records, financial transactions, or trade secrets. “These breaches put privacy at risk and expose enterprises to penalties, legal trouble, and damage to their reputation,” says Harris.

AI and data protection legislation

Using generative AI tools in business poses distinct regulatory challenges. While there isn't any regulation that specifically covers generative AI, current legislation and guidance create standards for responsible use of the technology, says Dr Leanne Allen, partner, data, data science and AI lead, KPMG UK.

For example, the EU AI Act, the OECD's AI principles, and the Digital Services Act stipulate that any type of AI must be “robust, safe, trustworthy, and secure”, she says.

Regulation around personal data is especially important for firms using AI, says Ben Travers, AI expert and partner at law firm Knights. “Many businesses will not have obtained appropriate consent, or don’t have another valid legal basis for processing personal data through AI. Any organization that defaults to legitimate interest as the basis for uploading personal data to an AI has probably misunderstood how the law works in this area.”

Ensuring generative AI adheres to data protection regulations such as the GDPR is “complex”, says Philip Brining, co-founder and managing director of Data Protection People. The technology's need for vast amounts of data often conflicts with GDPR principles such as data minimization and purpose limitation, he says.

Data subject rights under regulations – including the right to access, rectify, and erase data – add another layer of complexity, says Brining. “Ensuring generative AI systems comply with these rights is challenging, especially when data is deeply integrated into AI models.”

At the same time, GDPR requires all personal data to be stored either in the EU or within a jurisdiction that has comparable levels of protection. Sending personal data to a non-compliant data center outside of the EU will therefore break GDPR law. Amidst all this, UK businesses are still struggling to ensure their staff comply with GDPR rules when using AI.

Transparency and accountability are other factors to be taken into account. For example, it’s important to consider that firms may be required to explain a generative AI model's decision-making process, says Kevin Curran, IEEE senior member and professor of cyber security at Ulster University. This is especially true in “high-stakes” situations such as loan approvals or medical diagnoses, he says.

AI and data protection strategies

Data protection will continue to be a key concern for any firm including generative AI in their operations. But as AI technology develops, it’s possible to manage this using strategies and tools.

One of the key principles to adhere to is “privacy-by-design”, says Harris. “This means privacy protections should be a default consideration, ensuring that data collection aligns with reasonable expectations.”

Data anonymization and pseudonymization are essential techniques to protect identifiable information before feeding it into generative AI models, says Brining. “Implementing robust access controls is vital. Limiting who can interact with generative AI systems and what data can be processed helps prevent unauthorized access and data breaches.”

Developing AI-specific policies and procedures and regularly updating them ensures that data input standards, model training processes, and output handling are “well-defined and compliant with regulations”, Brining adds.

RELATED WHITEPAPER

Three myths about virtual migrating revealed

It may be that this is overseen by the CIO at a business, but some firms have also recognized the unique complexity of AI onboarding by hiring chief AI officers (CAIOs).

Meanwhile, businesses should consider whether it would be more appropriate to direct employees into using enterprise versions of generative AI tools to control use and access, says Jon Bartley, partner and head of the data advisory group at law firm RPC. “This may be a more practical solution compared to simply restricting employees from using free online tools.”

For example, when it comes to using AI to generate code there are a number of enterprise options including GitHub Copilot, Gemini CodeAssist, or Code Llama which can demonstrably deliver returns for business without putting proprietary code at risk.

It’s also important to bear in mind that any implementation of AI models within an organization is never truly finished. Ongoing work is needed to review the decisions being made and minimize harmful outputs and toxicity, says Harris. “Wherever possible, organizations must strive for explainability in the decisions a generative AI model is making, allowing genuine human control and mitigating risks as much as possible.”

Kate O'Flaherty is a freelance journalist with well over a decade's experience covering cyber security and privacy for publications including Wired, Forbes, the Guardian, the Observer, Infosecurity Magazine and the Times. Within cyber security and privacy, her specialist areas include critical national infrastructure security, cyber warfare, application security and regulation in the UK and the US amid increasing data collection by big tech firms such as Facebook and Google. You can follow Kate on Twitter.