Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi.
What’s in the data set:
Crawl start date: 09 March, 2011
Crawl end date: 23 December, 2011
Number of captures: 2,713,676,341
Number of unique URLs: 2,273,840,159
Number of hosts: 29,032,069
The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT) crawler software and respected robots.txt directives. The scope of the crawl was not limited except for a few manually excluded sites.
However this was a somewhat experimental crawl for us, as we were using newly minted software to feed URLs to the crawlers, and we know there were some operational issues with it. For example, in many cases we may not have crawled all of the embedded and linked objects in a page since the URLs for these resources were added into queues that quickly grew bigger than the intended size of the crawl (and therefore we never got to them). We also included repeated crawls of some Argentinian government sites, so looking at results by country will be somewhat skewed.
We have made many changes to how we do these wide crawls since this particular example, but we wanted to make the data available “warts and all” for people to experiment with. We have also done somefurther analysis of the content.
If you would like access to this set of crawl data, please contact us at info at archive dot org and let us know who you are and what you’re hoping to do with it. We may not be able to say “yes” to all requests, since we’re just figuring out whether this is a good idea, but everyone will be considered.

The World of Megastars


Just three contestants remain onAmerican Idol after Wednesday’s elimination.
Keep reading to find out who survived Tuesday’s rock show, and which of of the top four — Danny, Allison, Adam or Kris — went out likeDaughtry.
Idol is now officially heading to an all-male finale.Allison Iraheta, the 17-year-old phenomenon with a vocal strength that could topple frail trees, was voted off, leavingAdam Lambert, Danny Gokey andKris Allen to be saluted with hometown parades.
What went wrong? Her performance of aJanis Joplin classic wasn’t her best, but her duet with Adam had enough rock sizzle thatSimon thought she might be safe.Danny, meanwhile, pleased no one with his cover ofAerosmith’s “Dream On.” Even his aunt, he admitted, had muted the television when he bungled the high notes.
Allison was often faulted for not showing enough confidence or fire as a stage personality, but there were always confidence and fire in her singing, weren’t there? At any rate, the voting audience has always been ambivalent about her — perhaps because of her youth, her hair and her style, which has been compared toKelly Clarkson’s but is in fact much more raw.
No one left has her fresh, if rough, vitality. Shed a tear for Allison!

Left to right: Kris Allen, Allison Iraheta, Adam Lambert
And, man, did the hour drag. A full 35 minutes were gone beforeRyan Seacrest started the elimination procedure. (The order of names was random, he said — there was no bottom two announced.) The first to safety was a surprise: Kris cupped his hands over his face in shock and relief. His singing hasn’t consistently wowed the judges — this week, in particular — but the fact remains that of all the contestants he photographs best in a close-up. These things matter. Then Adam joined him, and then Danny.
It seems inconceivable that Adam won’t be in the final two. So who goes next — Kris or Danny?
The night’s guest performers includedNo Doubt, formerIdol contestantChris Daughtry and none other thanPaula Abdul. She and a chorus of male dancers ran through fog and red-fabric flames for her new single “I’m Just Here for the Music.” At the end, one of the dancers, pretending to be paparazzi, shouted, “Will you be back for next season?” She did not answer.
Seven singers performed on Tuesday’sIdol episode and the judges hada few choice words for some of them. But the results that matter — America votes, don’t forget! — are in, and it makes for season 8’s most surprising results show yet.
Keep reading to find out who’s singing next week and who faced the judges.
It was a night of dramatic reversals in the desperate lower depths of the competition, as the much-hypedJudges’ Save rule was finally put into play.Matt Giraud, told byRyan that he’d been eliminated by popular vote, sang an encore of “Have You Ever Really Loved a Woman.” He actually sang it worse than the night before, probably undone by (inRyan’s words) “all the pressure in the world.”
Simon weighed in with bleak honesty: “I don’t see that you have really any chance of winning.”Paula andKara, meanwhile, were out of their seats behind Simon, practically shouting in protest.
Then, Simon told Matt he was safe. As Matt cried in relief, the other singers huddled round and hugged him. At that point, Simon reminded the contestants that two, not one, will be eliminated next week. The other judges, after the cameras were off, congratulated a teary-eyed Matt while Simon hurried off stage.
You have to wonder whether Matt andLil Rounds, who joined him in the bottom two, are simply heading for a double guillotine. The judges have been dissatisfied with both of them as they keep trying to settle on a style that could make them as dependable as, say,Adam Lambert — or evenAllison Iraheta.
Anoop Desai, despite a performance of “(Everything I Do) I Do It for You” that earned some of Kara’s kindest words of the night, was also in the bottom three. Simon, who’d doubtless been hampered by thetwo-judges-per-singer rule, took the opportunity to say Anoop deserved to be there (and, contradicting Randy, he also praisedKris Allen as “excellent”).
But Anoop may find that disco suits his slick pop groove, and maybe Lil, with her love for soul and R&B, will too. Not so sure about Matt, though.– Tom Gliatto
by People
NASHVILLE, Tenn. – Jessica Simpson's courtship with country music seems to have had a shorter shelf life than her marriage.
After lackluster sales for her country debut, "Do You Know," Simpson and her Nashville record label have parted ways, leaving many wondering what's next for the 28-year-old entertainer.
"Right now it seems like she's taken a break from recording. There is nothing else on the books," said Ian Drew, senior music editor at Us Weekly magazine.
A spokeswoman for the one-time pop princess says Simpson remains part of the Sony Music Group on the Epic label, but is no longer working with the company's country division,Sony MusicNashville.
"She was on loan toSony Nashville for her country album," said Lauren Auslander.
As for her future in country music? "We don't know yet," she said.
"Do You Know" started strong but faded fast. The lead single, "Come on Over," a flirtatious, steel guitar-laced slice ofcountry pop, peaked at No. 18 last summer and the album debuted at No. 1. But the second single, "Remember That," stalled at No. 42, and the third, "Pray Out Loud," failed to chart.
To date, the disc,Simpson's fifth studio release, has sold around 178,000 copies — a long way from her 3 million-selling 2003 disc, "In This Skin."
"Everywhere I saw her around the U.S. at differentradio station events she was always well-received," said Lon Helton, editor and publisher of the industry trade publication "Country Aircheck." "For whatever reason, the music did not resonate."
Simpson came to country after her 2006 pop outing, "A Public Affair," fell flat. The Texas-born blonde touted the move as a return to her roots. She performed on theGrand Ole Opry, signed autographs at theCountry Music Association's annual festival, and toured with country's multiplatinum trioRascal Flatts.
But she got more publicity for her life outside of music, most of it far from positive. She was ridiculed when it seemed as if she had gained a few pounds, and the status of her romance with Dallas Cowboys quarterback Tony Romo was constantly scrutinized.
She was also criticized for a few erratic concert performances. At a February show in Michigan, Simpson apologized to fans after she forgot the lyrics to a song and asked her band to start over on another.
Some detractors viewed her country career as a calculated attempt to follow other pop stars who have found success on country radio.
"Working the country market is very different. You really have to work it at country. You have to spend your life on the road building an audience and she didn't really put the work in," Drew observed. "She walked the walk and talked the talk, but she didn't have the street cred that she needed to make it work."
But others say Simpson shouldn't bail too soon. She may just need more time to find an audience.
"It doesn't seem like she was even on thecountry music scene long enough to prove what she is capable of doing for this industry. She never got the chance," said Neely Yates, music director for country station 96.3 in Lubbock, Texas.
Helton wondered whether the singer was a victim of bad timing. Pop rockersDarius Rucker and Jewel were crossing over to country about the same time, which he called unusual in country music.
"What was the ability of the market to absorb and focus on more than one pop singer at a time coming over?" he asked.
The question now is whether Simpson will keep her record deal. After two disappointments, Epic may be ready to move on without her.
"She's never really sold a lot of records except for the album out at the height of 'Newlyweds,'" said Drew, referring to her popular reality TV show, "Newlyweds: Nick and Jessica," which chronicled her ill-fated marriage toNick Lachey. "Other than that, she's never been able to sell much of anything."
But in a recent interview,Rascal Flatts'Gary Levox said Simpson is in a no-win situation with her critics: "She's in a spot where whatever she does, they pick her apart. They need to just leave her alone and just let her sing."
"She's a wonderfully gifted singer," added bandmateJay DeMarcus. "All the other stuff overshadows what she's really about and it's unfortunate, because there's more to her there than just tabloid fodder."
by JOHN GEROME, AP Entertainment

Courtesy of Disney/Pixar and AMPAS