AWS' launches Textract tool capable of reading millions of files in a few hours
The machine learning-powered tool promises to be the most accurate for scalping data


AWS has said that its Textract tool, designed to extract and translate data between files, is now generally available for all customers.
The tool, which is a machine learning-driven feature of its cloud platform, lets customers autonomously extract data from documents and accurately convert it into a usable format, such as exporting contractual data into database forms.
The fully-managed tool requires no machine learning knowledge to use and works in virtually any document. Industries that work with specific file types such as financial services, insurance and healthcare will also be able to plug these into the tool.
Textract aims to expedite the laborious data entry process that is also often inaccurate when using other third-party software. Amazon claims it can accurately analyse millions of documents in "just a few hours".
"Many companies extract text and data from files such as contracts, expense reports, mortgage guarantees, fund prospectuses, tax documents, hospital claims, and patient forms through manual data entry or simple OCR software," the company said.
"This is a time-consuming and often inaccurate process that produces an output requiring extensive post-processing before it can be put in a format that is usable by other applications," it added.
Textract takes data from scanned files stored in Amazon S3 buckets, reads them and returns data in JSON text annotated with the page number, section, form labels, and data types.
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
PwC is already using the tool for its pharmaceutical clients, an industry that commonly uses processes that involve Food and Drug Administration (FDA) forms that would otherwise require hours to complete, according to Siddhartha Bhattacharya, director lead, healthcare AI at PwC.
"Previously, people would manually review, edit, and process these forms, each one taking hours," he said. "Amazon Textract has proven to be the most efficient and accurate OCR solution available for these forms, extracting all of the relevant information for review and processing, and reducing time spent from hours to down to minutes."
The Met Office is another organisation that plans to implement Textract, making use of old weather records.
"We hope to use AmazonTextract to digitise millions of historical weather observations from document archives," said Philip Brohan, climate scientist at the Met Office. "Making these observations available to science will improve our understanding of climate variability and change."

Connor Jones has been at the forefront of global cyber security news coverage for the past few years, breaking developments on major stories such as LockBit’s ransomware attack on Royal Mail International, and many others. He has also made sporadic appearances on the ITPro Podcast discussing topics from home desk setups all the way to hacking systems using prosthetic limbs. He has a master’s degree in Magazine Journalism from the University of Sheffield, and has previously written for the likes of Red Bull Esports and UNILAD tech during his career that started in 2015.
-
Bigger salaries, more burnout: Is the CISO role in crisis?
In-depth CISOs are more stressed than ever before – but why is this and what can be done?
By Kate O'Flaherty Published
-
Cheap cyber crime kits can be bought on the dark web for less than $25
News Research from NordVPN shows phishing kits are now widely available on the dark web and via messaging apps like Telegram, and are often selling for less than $25.
By Emma Woollacott Published
-
The Wiz acquisition stakes Google's claim as the go-to hyperscaler for cloud security – now it’s up to AWS and industry vendors to react
Analysis The Wiz acquisition could have monumental implications for the cloud security sector, with Google raising the stakes for competitors and industry vendors.
By Ross Kelly Published
-
AWS expands Ohio investment by $10 billion in major AI, cloud push
News The hyperscaler is ramping up investment in the midwestern state
By Nicole Kobie Published
-
Microsoft hit with £1 billion lawsuit over claims it’s “punishing UK businesses” for using competitor cloud services
News Customers using rival cloud services are paying too much for Windows Server, the complaint alleges
By Emma Woollacott Published
-
AWS re:Invent 2024 live: All the news and updates from day-three in Las Vegas
Live Blog ITPro is live on the ground in Las Vegas for AWS re:Invent 2024 – keep tabs on all the news and updates from day-three here
By George Fitzmaurice Last updated
-
Westcon-Comstor bags major European distribution deal with AWS
News The company plans to launch a dedicated European AWS cloud business unit
By Emma Woollacott Published
-
AWS opens physical sites for fast data uploads – but it could cost you up to $500 an hour
News Amazon Web Service (AWS) has launched a new Data Transfer Terminal service to allow customers to upload data to the cloud from a physical site.
By Emma Woollacott Published
-
Microsoft's Azure growth isn't cause for concern, analysts say
Analysis Azure growth has slowed slightly, but Microsoft faces bigger problems with expanding infrastructure
By George Fitzmaurice Published
-
The Open Cloud Coalition wants to promote a more competitive European cloud market – but is there more to the group than meets the eye?
Analysis The launch of the Open Cloud Coalition is the latest blow in a war of words between Microsoft and Google over European cloud
By Nicole Kobie Published