Firms full of dirty data
As much as 30 per cent of data held by companies could be “dirty” in some way, according to a BCS award winner.


Up to 30 per cent of an organisation's data could be "dirty", according to a top informatics researcher.
Delivering a lecture after winning the British Computer Society's (BCS) Roger Needham Award, Professor Wenfei Fan said that businesses and their customers could be negatively affected by "dirty data" any information which is inconsistent, inaccurate, incomplete or out of date. "Poor quality data can give us trouble the problem is everywhere," he said.
According to Fan, Australia has 500,000 dead people with active Medicare cards, while the US Pentagon's bad data quality led that organisation to attempt to send over 200 dead soldiers back to Iraq.
"In the UK, we're doing no better," he said, saying this country has issued 81 million national insurance numbers to a population of only 60 million.
But it's not just the public sector. Fan said that in a customer database of over half a million records, 120,000 become invalid within a year. And, the error rates for industry range from a fairly accurate one per cent to as high as 30 per cent.
And such dirt can lead to high costs, he said. Among other examples, he cited the case of Lehman Brothers, which inaccurately entered 300 million for a 3 million trade, taking 300 billion off the FTSE 100.
It's not just a problem for high finance, but for retail, too. Fan noted an example from Dell, which sold 15,000 computers in Chile for 79 when they were actually worth 303.
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
Fan claimed dirty data costs US businesses as much as $611 billion (412 billion) and US customers as much as $2.5 billion (1.68 billion) each year. "Real life data is dirty, and dirty data is costly," Fan said.
While he admitted 100 per cent accuracy was essentially impossible, Fan called for better tools to cross-reference databases to detect incorrect data.
Fan is chair of web data management in the School of Informatics at the University of Edinburgh. The Roger Needham award is given by the BCS and Microsoft to a UK-based researcher within ten years of their PhD for their contribution to computer science.
The entertaining lecture can be viewed [a href="http://emea25537091.emea.acrobat.com/bcs_
needham_2008" target="_blank"]here[/a]; click option eight' to go directly to Fan's lecture.
Freelance journalist Nicole Kobie first started writing for ITPro in 2007, with bylines in New Scientist, Wired, PC Pro and many more.
Nicole the author of a book about the history of technology, The Long History of the Future.
-
Global cybersecurity spending is set to rise 12% in 2025 – here are the industries ramping up investment
News Global cybersecurity spending is expected to surge this year, fueled by escalating state-sponsored threats and the rise of generative AI, according to new analysis from IDC.
By Ross Kelly Published
-
Google Cloud is leaning on all its strengths to support enterprise AI
Analysis Google Cloud made a big statement at its annual conference last week, staking its claim as the go-to provider for enterprise AI adoption.
By Rory Bathgate Published
-
Empowering enterprises with AI: Entering the era of choice
whitepaper How High Performance Computing (HPC) is making great ideas greater, bringing out their boundless potential, and driving innovation forward
By ITPro Last updated
-
The CEO's guide to generative AI: Be a creator, not a consumer
Whitepaper Innovate your business model with modern IT architecture, and the principles of trustworthy AI
By ITPro Published
-
Learning and operating Presto
whitepaper Meet your team’s warehouse and lakehouse infrastructure needs
By ITPro Published
-
Scale AI workloads: An open data lakehouse approach
whitepaper Combine the advantages of data warehouses and data lakes within a new managed cloud service
By ITPro Published
-
Managing data for AI and analytics at scale with an Open Data Lakehouse approach
whitepaper Discover a fit-for-purpose data store to scale AI workloads
By ITPro Published
-
The power of AI & automation: Productivity and agility
whitepaper To perform at its peak, automation requires incessant data from across the organization and partner ecosystem
By ITPro Published
-
A guide to help you choose the UPS battery backup for your needs
Whitepaper Download this guide and stay connected with a UPS that's free of interruption or disturbance
By ITPro Published
-
Managing data for AI and analytics at scale with an open data lakehouse approach: IBM watsonx.data
whitepaper Eliminate information silos that are difficult to integrate
By ITPro Published