Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Google Remakes Online Empire With 'Colossus'

More than a decade ago, Google built a new foundation for its search engine. It was called the Google File System — GFS, for short. But Google no longer uses GFS. Two years ago, the company moved its search to a new software foundation based on a revamped file system known as Colossus, and Urs Hölzle — the man who oversees Google’s worldwide network of data centers — tells Wired that Colossus now underpins virtually all of Google’s web services.
Image may contain Monument Human Person Archaeology Building and Architecture
girolame/Flickr

More than a decade ago, Google built a new foundation for its search engine. It was called the Google File System -- GFS, for short -- and it ran across a sweeping army of computer servers, turning an entire data center into something that behaved a lot like a single machine. As Google crawled the world's webpages, grabbing data for use in its search engine, it could spread this massive collection of data over all those servers, before using the chips on these machines to crunch everything into a single, searchable index.

GFS was so successful, it soon reinvented the rest of the web. After Google released research papersdescribing GFS and asister software platform called MapReduce -- the piece that crunches the data -- Yahoo, Facebook, and others built their own version of the Google foundation. It was called Hadoop, and this open source platform is now driving a revolution acrossthe world of business software as well.

But Google no longer uses GFS. Two years ago, the company moved its search to anew software foundation based on a revamped file systemknown as Colossus, and Urs Hölzle -- the man who oversees Google's worldwide network of data centers -- tells Wired that Colossus now underpins virtually all of Google's web services, from Gmail, Google Docs, and YouTube to the Google Cloud Storage service the company offers to third-party developers.

Whereas GFS was built for batch operations -- i.e., operations that happen in the background before they're actually applied to a live website -- Colossus isspecifically built for "realtime" services, where the processing happens almost instantly. In the past, for instance, Google would use GFS and MapReduce to build a new search index every few days and -- as the system matured -- every few hours. But with Colossus and its new search infrastructure -- known as Caffeine -- Google needn't rebuild the index from scratch. It can constantly update the existing index with new information in real time.

The move to Colossus foretells a similar move across the rest of the web -- and beyond -- as is so often the case with the hardware and software that underpins Google's massively popular web services. Because its services are used by so many people -- and it's juggling so much data -- Google is often forced to solve very large problems before the rest of the world, but then others will follow. Colossus is already echoed by recent changes to Hadoop, a platform now used by everyone from Facebook to Twitter and eBay.

So that it's better suited for realtime applications, Colossus eliminates a "single point of failure" that plagued the original Google File System. With GFS, a "master node" -- or master server -- oversaw data that was spread across an army of "chunkservers." These chunkservers stored chunks of data, each about 64 megabytes in size. The problem was that if the master node went down, the whole system went down -- at least temporarily. Colossus solved this problem by adding multiple master nodes.

"A single point of failure may not have been a disaster for batch-oriented applications," Googler Sean Quinlansaid, just before Colossus was rolled out, "but it was certainly unacceptable for latency-sensitive applications, such as video serving."

The new file system also reduces the size of the data chunks down to 1MB. Together with the addition of multiple master nodes, this lets Google store far more far more files across a far larger number of machines.

Hölzle calls Colossus "similar to GFS -- but done better after ten years of experience."

With its search engine, Google has not only dropped GFS. It has dropped MapReduce. Rather than using MapReduce to build a new index every so often, it uses a new platform called "Caffeine" that operates more like a database, where you can read and write data whenever you like.

In similar fashion, Hadoop developers are working to eliminate single points of failure and tweak the platform for use with realtime services. Acompany called MapR has built a new proprietary version of Hadoop that includes an entirely new file system, while others have worked to remove single points of failure in the open source version of the platform. And in much the same way Google uses distributed databases atop Colossus, Hadoop dovetails with a database called Hbase that's better suited to realtime services.

Jan Gelin -- vice president of technical operations for the Rubicon Project, a realtime trading platform for online ads -- recently moved his service to MapR's proprietary version of Hadoop in part because it eliminated a "single point of failure" that plagued earlier open source versions of the platform. As with GFS, the original incarnation of Hadoop used a single machine -- known as the name node -- to oversee all other servers in a cluster, and if that one machine went down, the entire process would stop.

"We had a lot of those issues," Gelin says. "We have roughly a petabyte of data inside of Hadoop, and it was always nerve-wracking when the name node didn't check-point and you’re wondering if you’re going to lose all your storage or all the pointers to where your data is.

"That’s OK if you’re doing research stuff, but if you’re depending on your data in the way we’re going to be now, it's not."

During a recent event in Silicon Valley, Mike Olson -- the CEO Cloudera, another Hadoop outfit -- said that this problem has also been fixed in the open source version of Hadoop.

Though Google has not open sourced the code behind Colossus, outside developers can still make use of the file system. As Hölzle points out, Colossus underpins Google Cloud Storage, the online storage service Google offers to developers across the globe in much the same way Amazon offers its S3 storage service.

Cade Metz is a former WIRED senior staff writer covering Google, Facebook, artificial intelligence, bitcoin, data centers, computer chips, programming languages, and other ways the world is changing. ...Read More
Senior Writer
Read More
It's Not Just Epstein. MAGA Is Angry About a Lot of Things
Pockets of Donald Trump’s most loyal base are increasingly angry at what they view as the administration’s failure to fulfill its promises.
GM’s Final EV Battery Strategy Copies China’s Playbook: Super Cheap Cells
General Motors’ homemade version of the low-cost power option favored by China’s auto industry will hit three years before its super-energy-dense tech arrives—and could bring affordable US EVs sooner.
AI 'Nudify' Websites Are Raking in Millions of Dollars
Millions of people are accessing harmful AI “nudify” websites. New analysis says the sites are making millions and rely on tech from US companies.
The Structure of Ice in Space Is Neither Order nor Chaos—It’s Both
Long thought to be completely disordered, space ice appears to have some crystallized regions, new research suggests.
For Algorithms, Memory Is a Far More Powerful Resource Than Time
One computer scientist’s “stunning” proof is the first progress in 50 years on one of the most famous questions in computer science.
What Makes a Car Lovable? It's Not the Tech, It's the Cup Holders
Frustration with the size, location, and design of cup holders in new cars is on the rise—and it holds enormous influence on whether we buy a ride or not.
These LGBTQ+ Archives Defy Erasure, One Memory at a Time
In Latin America, LGBTQ+ history collections are a form of resistance. Grassroots projects are using the memories of community members to fight against systematic violence and demand justice.
Everything We Know About the Interstellar Object 3I/ATLAS
A team of astronomers recently discovered the traveling space object, just the third of its kind to pass through our solar system.
HP Coupon Codes: 30% Off in July 2025
Save up to 60%, plus an extra 20% with HP promo codes for laptops, printers, PCs, and more tech.
The 20 Best Movies on Amazon Prime Right Now
Deep Cover,Conclave, andChallengers are just a few of the movies you should be watching on Amazon Prime Video this week.
How to Use Clean Energy Tax Credits Before They Disappear
There are just a few weeks left to tap federal programs that make purchasing an EV, heat pump, or solar panels more affordable.
The 41 Best Shows on Netflix Right Now
Squid Game,Black Mirror, andThe Survivors are just a few of the shows you need to watch on Netflix this month.

[8]ページ先頭

©2009-2025 Movatter.jp