Google says the web hits a trillion pages

published 28 July 2008

The web has hit a new milestone, as Google announces that its indexing system has found a trillion unique URLs.

In its blog, the web search giant said that its search engineers "stopped in awe" when they realised how big the web had become, after the index hit the trillion mark that's 1,000,000,000,000. The web is growing by several billion each and every day, Google added.

The first index by Google in 1998 found 26 million pages; the billion page mark was passed in 2000. Back in the day, Google batch processed everything, it said. Now, it's done constantly, with the entire set several times a day.

"This graph of one trillion URLs is similar to a map made up of one trillion intersections. So multiple times every day, we do the computational equivalent of fully exploring every intersection of every road in the United States," Google software engineers Jesse Alpert and Nissan Hajaj wrote in the blog.

Google said it's found more than one trillion pages, but that many are simply auto-generated copies. Indeed, the search giant doesn't even bother indexing all trillion unique pages, as some are too similar to each other, it said.

"So how many unique pages does the web really contain? We don't know; we don't have time to look at them all," they wrote, adding that "the size of the web really depends on your definition of what's a useful page, and there is no exact answer."

Google's trillion page announcement comes as a rival called Cuil, founded by former Google employees, hits the web, with claims that it can index more of the web, faster and more cheaply than Google. At its launch, it said it had indexed 120 billion pages.