Wayback is Hugh

How the Wayback Machine Works

In the Wayback Machine, currently there are 10 billion Web pages, collected over five years. That amounts to 100 terabytes, which is 100 million megabytes. So if a book is a megabyte, which is about what it is, and the Library of Congress has 20 million books, that’s 20 terabytes. This is 100 terabytes. At that size, this is the largest database ever built. It’s larger than Walmart’s, American Express’, the IRS. It’s the largest database ever built. And it’s receiving queries — because every page request when people are surfing around is a query to this database — at the rate of 200 queries per second.

One reply on “Wayback is Hugh”

  1. Great interview… Interesting way of doing parallel computing — instead of using some PSI, MOSIX, etc, which require you to write programs keeping that nature in mind, just write a batch processing system in perl…

Comments are closed.

Wayback is Hugh

How the Wayback Machine Works

In the Wayback Machine, currently there are 10 billion Web pages, collected over five years. That amounts to 100 terabytes, which is 100 million megabytes. So if a book is a megabyte, which is about what it is, and the Library of Congress has 20 million books, that’s 20 terabytes. This is 100 terabytes. At that size, this is the largest database ever built. It’s larger than Walmart’s, American Express’, the IRS. It’s the largest database ever built. And it’s receiving queries — because every page request when people are surfing around is a query to this database — at the rate of 200 queries per second.

One reply on “Wayback is Hugh”

  1. Great interview… Interesting way of doing parallel computing — instead of using some PSI, MOSIX, etc, which require you to write programs keeping that nature in mind, just write a batch processing system in perl…

Comments are closed.