What Is Google's Bigtable System?



In the world of cloud computing, one essential ingredient is a database that can accommodate a very large number of users on an on-demand basis. Many Web applications use databases, typically of the SQL variety, for various tasks. What's needed is a database that can be carved into subsets that can be accessed by various users, is distributed across many servers and is highly responsive; it should also be able accommodate a virtually infinite variety of data tables.

Google has had a proprietary database, called Bigtable, since early 2005. Bigtable is the basis of Google's search technology, as well as many other applications such Google Finance, Google Maps and Google Earth. Bigtable was developed with very high speed, flexibility and extremely high scalability in mind. A Bigtable database can be petabytes in size and span thousands of distributed servers.

In April 2008, Google announced that is was making Bigtable available to outside developers as part of Google App Engine, the company's cloud-computing platform. The only other large company that offers a database for cloud computing is Amazon.com Inc., so Google's entry into the market is a pretty big deal.

Understanding Bigtable's architecture is a job for Ph.D.s. Google has released one highly technical document describing Bigtable's plumbing, and it is recommended for potential developers who want to understand the database's technical details.




Basic Architecture of BigTable

Bigtable is described as a fast and extremely scalable DBMS (database management system). It is based on the proprietary Google File System, which gives Bigtable the ability to scale across hundreds or thousands of commodity servers that collectively can store petabytes of data.

Each table is a multidimensional sparse map. The table consists of rows and columns, and each cell has a time stamp. There can be multiple versions of a cell with different time stamps. The time stamp allows for operations such as "select 'n' versions of this Web page" or "delete cells that are older than … "

In order to manage the huge tables, Bigtable splits tables at row boundaries and saves them as tablets. Each tablet is around 200MB, and each server saves about 100 tablets. This setup allows tablets from a single table to be spread among many machines. It also allows for fine-grained load balancing, because if one table is receiving many queries, it can shed other tablets or move the busy table to another machine that is not so busy. Also, if a machine goes down, a tablet may be spread across many other machines so that the performance impact on any given machine is minimal.

Tables are stored as immutable SSTables and a tail of logs (one log per machine). When a machine's system memory is full, it compresses some tablets using Google proprietary compression techniques such as BMDiff and Zippy. Minor compactions involve only a few tablets, while major compactions involve the whole table system and recover hard-disk space.

The locations of Bigtable tablets are stored in cells. The lookup of any particular tablet is handled by a three-tiered system. The clients get a point to a META0 table, of which there is only one. The META0 table keeps track of many META1 tablets that contain the locations of the tablets being looked up. Both META0 and META1 make heavy use of pre-fetching and caching to minimize bottlenecks in the system.

Bigtable's Release

Bigtable was released in May 2008 as part of Google App Engine. As is typical with Google offerings, Bigtable is free to use, and the service is described as "in beta," even though Google has been using Bigtable internally for more than three years. The first 10,000 developers that signed up for the Google App Engine service received 500MB of storage and enough computing power and bandwidth to handle 5 million page views per month — all for free.

The system allows developers to use Google's tremendous infrastructure, which enables applications to handle very large spikes in traffic that would otherwise require extensive revisions to database architecture. Google App Engine allows developers to concentrate on their applications, while Google handles maintenance chores such as load balancing and replication.

Google has opened Bigtable to the development community in order to further its vision of cloud computing and the on-demand paradigm of computing-resources sales.

Misgivings about Bigtable

Developers have been generally positive about Bigtable and Google App Engine, as well as competitor Amazon Web Services. After all, Google App Engine gives developers access to very powerful Web platforms at no cost.

"Companies like Google and Amazon have a tonne of bandwidth that they can load share really well," managing director of Web-development company Western Civilisation Pty. Ltd. John Allsopp told ZDNet Australia. "As a developer, when you launch something, you might get a big hit on it, so you really want a system that can provide the bandwidth when you need it."

But developers are nonetheless leery of the proprietary Bigtable platform, which locks applications into the Google stable. The same is true of Amazon Web Services and other cloud-computing services.

The danger is that cloud-computing vendors may decide to discontinue services upon which Web applications depend, and there will be no way to move those applications to other platforms.

Google has announced a pricing structure for Google App Engine and Bigtable, and it contains pleasant surprises for start-ups and other cost-conscious developers. It seems that a 4TB database application on Google App Engine costs one-tenth of what it would on Amazon Web Services. Google App Engine's price structure is on par with Amazon.com's S3 (Simple Storage Service).