Conservation International uses big data analytics to help the environment

Penguins on beach

The Problem

  • Huge databases with millions of records
  • Actionable data hard to extract
  • Processing times too slow, taking weeks and months

The Solution

  • HP Vertica platform with powerful analysis tools
  • Prediction and correlation data easy to extrapolate
  • Processing times slashed to hours and days

There are terabytes and petabytes of data produced in cities around the world every minute of every day, but how much data do you think a rainforest produces?

There are no computers and servers, no networking tools or complex machinery, yet trying to keep a handle on all the data that 17 rainforest environments produce has been more than a challenge for biodiversity charity Conservation International.

Working in 20 countries across all of the planet’s continents, Conservational International was established 25 years ago in order to empower societies to cohabit a sustainable world for people and for animals. The non-profit organisation collects data on biodiversity, climate and land use in the tropical zones of the globe.

Ten years of that collection amounted to huge databases which stored more than 200 million images – with 500,000 added every year – a quagmire of data so thick that CI analysts were finding it tough to extract actionable data, let alone process it once it had been identified.

It was in 2012 that HP approached CI, looking for partners to diversify the use of its Vertica analytical program. For the two to “run into each other” was very fortunate, says Jorge Ahumada, executive director for the tropical ecology assessment and monitoring network at Conservation International. “HP has a history of striving to make society better.”

General manager of HP Vertica, Colin Mahoney, agrees: “HP has a history of social innovation and sustainability ... This is one of the areas that [we] really shine.” HP Vertica, says Mahoney, was a purpose-built platform designed to take on all the challenges that it would face when coming up against a database of CI’s size – the “big data equivalent of analytical steroids” in the words of HP managing director Jonathon Dove.

The software is usually delivered to customers as a service at prices ranging from $5,000 (£2,920) to $50,000 (£29,200) per TB, yet Conservation International, as a non-profit organisation and a HP partner, is supplied with Vertica for no charge whatsoever.

Benefits

Before the introduction of big data analysis to the environment sciences, says Mahoney, "teams of scientists would extract a small sample from an environment and extrapolate outwards, often making inaccurate conclusions. Now, because of "camera traps that take pictures of animals, satellite imagery and the increasing detail of the information logged, “we are able to keep score almost in real time and literally watch what’s happening,” he adds. "I’ve had people ask for all the information on a species and it being complete by the next morning."

The big change once CI began to implement Vertica, says Ahumada, was the processing time. The software extracts an index number from the database very similar to the system employed by the stock exchange.

Before the introduction of Vertica to properly extract actionable data for just one species would take several hours and for a site it would take a few days.

For an entire network, adds Ahumada, it could take up to a month. Now he says, Vertica enables them to crunch the numbers on all 570 species CI monitors in “just a few hours ... I’ve had people ask for all the information on a species and it being complete by the next morning.” With the data being identified and extracted quickly, CI can then use it to predict possible changes in the levels of biodiversity in any of their 17 tropical sites.

According to Ahumada the implementation of Vertica had little in the way of teething troubles. CI has since “played with the algorithms” in order to tailor the program to their needs and created specific models for specific species under study. “It’s been a really smooth transition and has helped us develop quickly,” he said.

The Future

Having eliminated “one of the major bottlenecks to processing the data”, Conservation International is hoping to tailor its data use towards more specific methods of collection.

Face recognition software is on the cards, with the company hoping to be able to create a program that will log an animal’s species as soon as it is caught by a camera trap.

To that end, HP is providing its Autonomy system, too. “The environment is one of those places where there isn’t a lot of big data compared to industry, corporations and finance,” adds Ahumada. “CI, working with the Wildlife Preservation Society in particular, is looking to collect camera trap data worldwide – from the Smithsonian and everyone else that does these types of traps. We want to set up a federation of sorts so that data sets can be correlated worldwide; something HP has been instrumental in helping us work towards.”

The processing of satellite imagery is another field in which CI wants to see some improvement, too. Currently the organisation uses satellite images to work out how land use is changing in the world’s biomes – but that can be a hazard when so many rainforests are covered in cloud. To that end it is looking to partner with HP further in order to power data-crunching software that can “burn away” cloud cover to get at the useable image underneath.

“My hope is that Conservation International gets more and more data in the system faster and faster,” Mahoney said. “Whether its data in the Internet of Things, satellite data that can burn through cloud cover or even data that is just out there around building and construction sites.” “HP wants to combine that information and create a matrix of simulations and correlations that ultimately will allow CI and society to realise its goals in sustainability.”