How biased is your app?
Why businesses must spot and fix algorithmic bias in their products, before users, and lawyers, do
This article originally appeared in issue 23 of IT Pro 20/20, available here. To sign up to receive each new issue in your inbox, click here
Is ‘bias bounty’ the new bug bounty? The high-speed evolution of artificial intelligence (AI) has left companies facing such hefty consequences of unwitting bias in their apps, that they're now offering people cash prizes to solve these problems.
Twitter, for example, announced its bias bounty in July when it invited developers to solve its infamous saliency algorithm, better known as the image-cropping algorithm, which demonstrated racial and gender bias. Two weeks on, the firm awarded $7,000 (roughly £5,000) to participants who identified flaws in its code. Other companies have also faced legal action as a result of AI bias, including both Google and Uber in April.
People have protested against Uber's use of automated systems
The sum awarded as part of Twitter’s bias bounty is a drop in the ocean, and pales against the €150,000 (£127,000) Uber was fined in April. The Dutch court found its ‘robo-firings’, whereby Uber fired drivers “by algorithmic means”, violated Article 22 of the General Data Protection Regulation (GDPR). Also in April, Associated Newspapers sued Google for "hiding" links to MailOnline articles through ‘bid-rigging and search algorithmic bias’.
Algorithmic bias never comes from nowhere, of course; it begins with biased data. Human data is naturally skewed, with conscious and unconscious biases leaving their fingerprints all over datasets. The trick comes in spotting – and removing – any biases from your data and apps before it's too late.
Risky business
Despite the fact innovation often outpaces legislation, are organisations getting twitchy about the legal risks of AI? “The legal risk is simple but serious,” Simon Carroll, dispute resolution partner at legal firm, BP Collins, tells IT Pro. “Biased algorithmic decisions could breach the Equality Act 2010, which aims to protect from unlawful discrimination by automated systems as well as by people.”
Get the ITPro. daily newsletter
Receive our latest news, industry updates, featured resources and more. Sign up today to receive our FREE report on AI cyber crime & security - newly updated for 2024.
Facebook recently ditched automatic facial recognition, and Twitter's mobile app has now removed the biased image-cropping algorithm altogether, although these are small details in the bigger picture of increasingly dominant decision-making algorithms. AI was advancing rapidly in the UK, even before COVID-19 accelerated the pace of innovation to bridge the gap left by social distancing. Machines now manage various aspects of your life, from what you watch on Netflix to who your business should hire and fire.
AI bias is such a serious risk, however, that the UK government recently published substantive findings into its implications for criminal justice and financial services, among other sectors. This led to a set of legal guidelines for the use of automated tech, with the Information Commissioner's Office (ICO) also pledging to update its guidance on AI.
Companies the size of Twitter and IBM, which has built bias-detection into its Watson OpenScale software, have the resources to create anti-bias tools and offer bounties to catch errors that have fallen through the cracks. For smaller organisations, however, it’s far more of a challenge to comply with these guidelines and stay within the boundaries of the law, not to mention ethical boundaries.
Back to data basics
Biased apps start with biased data. Bias can sneak into datasets from all kinds of sources, including sampling, misinterpretation and selection biases, and it's not always easy to recognise.
IEEE AI ethics engineer Eleanor Watson advises businesses to make datasets as inclusive as possible. "Ensure that data is gathered from as broad a sampling as possible,” she says, “and solicit less common examples to ensure that the data is more representative of a global population and global environments."
How to find and root out unconscious bias
Ten benefits of Oracle’s data management platform
Freedom from business constraints and manual IT tasks
To ensure the data's integrity, Watson also suggests performing 'sanity checks'. This means searching for errors and missing data, then either fixing it or setting that section aside. "This kind of work is a core duty of data science," she adds. "Many of these rather dull efforts are performed by legions of workers in less-developed nations for very small sums of money, with uncertain credentials."
Developers should also rigorously test models against real-world examples in as broad a range of environments and demographics as possible. Ideally, companies should ring-fence resources to provide bounties that help uncover issues long before they occur 'in the wild', where real people are affected.
Testing mustn't end once an app is being used, of course, which is especially true when machine learning comes into play. With machine learning, data biases can be reinforced over and over again, creating a vicious circle of ever-worsening prejudice and biased decisions of the kind that land organisations in court.
Testing, testing, testing
Datasets and testing of this scale are expensive and time-consuming, and that can be a problem for smaller companies. “In a commercial environment, it's often overlooked due to quick development times and the need for high accuracy,” says CTO of AI photo app, Fyma, Taavi Tammiste. “Companies with more resources, therefore, will have the edge in unbiased algorithms.” A methodical approach, however, pays dividends, he says. “You essentially have to go through all the possible ways the algorithm could be biased, and then make sure that the dataset addresses all of those.” He also advises constant testing to spot biases that develop in algorithms later on.
Several open-source tools aim to help with bias detection. Google's What-If throws questions and scenarios at your ML model to test for bias. IBM's AI Fairness 360 moves up a gear with a comprehensive toolkit for finding and removing bias.
Synthesized's FairLens, meanwhile, works in a similar way, integrating into your workflow and highlighting, for example, whether decisions for ethnic minority groups are different from the decisions for any other groups. The platform then allows data scientists and developers to seek and share tips on how to mitigate the bias.
IBM has been at the forefront of developing AI tools for businesses
The development community can be incredibly helpful with testing, says Denise Gosnell, chief data officer at DataStax. She cites Google's word2vec vector creation tool, from which gender and racial biases were spotted by the community in 2017: "Devs noticed that King - Man + Woman gave the result 'Queen', but Doctor - Man + Woman gave 'Nurse'. We need to start openly sharing systematic observations like this."
Perhaps the most useful way to mitigate against bias in your products, however, is to bring a diverse set of people to your team and include them in all stages of your project. Your company may lack the resources to access a massive dataset, but you can, at the very least, employ a range of people from different backgrounds who might detect biases others are unwittingly blind to.
Jane Hoskyn has been a journalist for over 25 years, with bylines in Men's Health, the Mail on Sunday, BBC Radio and more. In between freelancing, her roles have included features editor for Computeractive and technology editor for Broadcast, and she was named IPC Media Commissioning Editor of the Year for her work at Web User. Today, she specialises in writing features about user experience (UX), security and accessibility in B2B and consumer tech. You can follow Jane's personal Twitter account at @janeskyn.