Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi.
What’s in the data set:
Crawl start date: 09 March, 2011
Crawl end date: 23 December, 2011
Number of captures: 2,713,676,341
Number of unique URLs: 2,273,840,159
Number of hosts: 29,032,069
The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT) crawler software and respected robots.txt directives. The scope of the crawl was not limited except for a few manually excluded sites.
However this was a somewhat experimental crawl for us, as we were using newly minted software to feed URLs to the crawlers, and we know there were some operational issues with it. For example, in many cases we may not have crawled all of the embedded and linked objects in a page since the URLs for these resources were added into queues that quickly grew bigger than the intended size of the crawl (and therefore we never got to them). We also included repeated crawls of some Argentinian government sites, so looking at results by country will be somewhat skewed.
We have made many changes to how we do these wide crawls since this particular example, but we wanted to make the data available “warts and all” for people to experiment with. We have also done somefurther analysis of the content.
If you would like access to this set of crawl data, please contact us at info at archive dot org and let us know who you are and what you’re hoping to do with it. We may not be able to say “yes” to all requests, since we’re just figuring out whether this is a good idea, but everyone will be considered.

| Reviews |Album Reviews |Beyoncé |B'Day |
According to most reasonable criteria, Beyoncé's second albumB'Day is a success, outpacing her solo debutDangerously in Love. Here, Beyoncé delivers precisely what many listeners have always wanted from her: a short, tight, and energetic set that's heavy on upbeat numbers and funk affectations, and light on the balladry and melisma.
B'Day captures the r&b singer at her warmest and most in-the-moment: There's a certain ramshackle messiness to these grooves, elliptically orbiting the classic pop song in a manner more reminiscent of Amerie's "1 Thing" than Beyoncé's sonically similar "Crazy in Love". Beyoncé sounds more relaxed as a singer, expanding on the Tina Turner resemblances she's been toying with recently, her performances growing ever-more instinctive and unpredictable in their appropriations of soul hollering. Most radically, the siren-assisted caterwaul of second single "Ring the Alarm" sounds genuinely (and marvelously) incoherent, her voice thrillingly sharp with anxiety and paranoia.
Remaining in soul mama character throughout, her newfound expressiveness fits so hand-in-glove with Richcraft or Neptunes-style funk drum patterns and surging horns that even when she departs from this style sonically-- such as on the percussive, Diwali-esque jam "Get Me Bodied", or the stiffly blaring "Upgrade U"-- the shift feels negligible, and you can still hear the ghosts of horn sections. Beyoncé's lyrics are also funnier and more idiosyncratic than ever: "I can do for you what Martin did for the people," she boasts on "Upgrade U"'s extreme makeover hard-sell, and I suspect she knows she's the only r&b singer who could deliver the line with a straight face.
So far so good, but what prevents this from being the classic pop album the above would suggest is that, well, Beyoncé simply isn't making classic pop anymore. By resolving the criticisms of her earlier work (too strident, too deliberate, too driven) Beyoncé has weakened her perfect pop technique.B'Day lacks the precision with which her earlier hits were crafted-- the alluring poise of "Baby Boy" is nowhere in evidence, and the glittering impregnability of the great Destiny's Child singles feels even more distant.B'Day sounds like an entire album of third and fourth singles, which is still better than an album of filler, but in a genre so overwhelmingly defined by its hit singles, a "Crazy in Love" or a "Baby Boy" can punch above its weight-- the consistency of "Déjà Vu" in this regard becomes a double-edged sword.
Most of all, though, Beyoncé just sounds tooreal here: It was her pitch-perfect plasticity which gave much of her earlier work its majestic aura, as if she had transcended ordinary goals in a narcissistic drive for perfection. Having voluntarily stepped down from her pedestal, she now struggles to inspire the same sense of awe: Her songs emote as intensely as before, but their emotions are all too human.
Ironically perhaps, this switch delivers its biggest pay-off, andB'Day's best song, with the ballad-of-sorts "Irreplaceable". It's as if, having lost the Midas touch of gleaming pop perfection, Beyoncé has opened up the possibility of stumbling on brilliance by accident. "You must not know 'bout me/ I can have another you in a minute/ Matter-fact he'll be here in a minute," she boasts to a swiftly exiting lover, in a hopelessly unconvincing attempt at callous indifference. Before, Beyoncé's approach to heartbreak was always literal, her voice and her words declaiming her feelings with a studied earnestness that at times was difficult to believe, let alone connect with. "Irreplaceable" is the first song in which Beyoncé lies to herself, and the way her voice perfectly betrays that lie (revealing a giveaway tremble in the stiff upper lip of the lyrics) simultaneously renders it her most sophisticated and her most honest performance to date.
—Tim Finney, September 7, 2006
© 2011 Pitchfork Media Inc. All rights reserved. |RSS |Privacy Policy |Terms of Use |Jobs |Advertising |Staff |Contact