Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi.
What’s in the data set:
Crawl start date: 09 March, 2011
Crawl end date: 23 December, 2011
Number of captures: 2,713,676,341
Number of unique URLs: 2,273,840,159
Number of hosts: 29,032,069
The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT) crawler software and respected robots.txt directives. The scope of the crawl was not limited except for a few manually excluded sites.
However this was a somewhat experimental crawl for us, as we were using newly minted software to feed URLs to the crawlers, and we know there were some operational issues with it. For example, in many cases we may not have crawled all of the embedded and linked objects in a page since the URLs for these resources were added into queues that quickly grew bigger than the intended size of the crawl (and therefore we never got to them). We also included repeated crawls of some Argentinian government sites, so looking at results by country will be somewhat skewed.
We have made many changes to how we do these wide crawls since this particular example, but we wanted to make the data available “warts and all” for people to experiment with. We have also done somefurther analysis of the content.
If you would like access to this set of crawl data, please contact us at info at archive dot org and let us know who you are and what you’re hoping to do with it. We may not be able to say “yes” to all requests, since we’re just figuring out whether this is a good idea, but everyone will be considered.

| ||||||
CategoriesRelated Links
Your Shopping CartYour cart is empty | M.O.N.H Moveable MuseumThe Museum of Natural History in NY has an educational exhibition in the form of a mobile gallery. It is calledMoveable Museum. There are Five Moveable museums total, which are each their own theme: 1. Dinosaurs: Ancient Fossils, New Discoveries 2. Discovering the Universe 3. Structures and Culture (about the homes and lives of threemodern nomadic peoples) 4. Paleontology of Dinosaurs In Wed. Nov. 4 I got a tour of two of the trucks. Discovering the Univeserse and Pelaontology of Dinosaurs. Discovering the univers was for older students between 10 and 16, Pelaontology of Dinosaurs is for students between 5 and 10. I forgot my camera that day but managed to take some cell phone pictures. I have some photos from the museum archives included as well. GRAND Flash Album GallerySkins for GRAND FlAGallery, Photo Galleries, Video Galleriesdeveloped by CodEasily.com - WordPress Flash Templates, WordPress Themes and WordPress pluginsTheFlash Player and a browser with Javascript support are needed.Here are some interview questions and notes from the tour: 1. When was the first museum created and who is respnsible for the design of the truck? The first truck was created in 1993, but there were mobile musuems associated with the museum since the 50’s. They use to drive around in trucks/cars with artifacts to different parts of NY. It takes about 2 years to make one museum. The educators are mostly responsible for the design within the museum. Scientist play a big role as well. 2. How much time do Kids spend in the truck? Is there additional activities they do after leaving the truck? We visit schools and get every class for one hour. Half an hour is spent in the truck, the other half hour is spent in the classroom with slides and lectures. They often want to spend more time in the truck, but can’t because the next class has to come in. Teachers usually do follow up activities with the students in relation to what they did in the truck. 3. Do you ever open the truck to the public? If so are people interested? Yes. We visit summer camps, community events and street fairs. We let 15 or fewer people in the truck at one time. THey are very interested. THey interact with the stations that have buttons first, then move around to other parts. 4. Is there a linear way people have to get the most out of the museum? If so, do people follow it? ALmost everyone goes to the infared camera first. The truck is designed so people move counter clockwise, but it is not essential that has to happen. The students get worksheets that direct them from station to station, which forces them to move through the exhibit linearly. The public moves around randomly and does not spend as much time reading every panel. 5. What role do the educators play in the overall experience? The educators do presentations in the classrooms with slides and artifacts. They are in the truck as well, but leave the students to find things on their own unless they need help. THe educators are mostly all scientists. They did not get degrees in teaching. 6. Is there more success educating students with a mobile museum than there is in a stationary one? The benefit of the moveable museum is that it is a smaller scale. There is less success in the MONH with educating students because there is much more to see and do. Students can focus easier in the moveable museum. We can customize the moveable museum experience to fit the need of the audience. Sometimes we visit special populations and we can adjust any part of the experience to the population. Many times teachers will warn us of a difficult child, but most of the time the student is well behaved in the truck. THe new learning environment creates a new perspective. Education is a challenge the large museum is always struggling with because there it is so big, but the moveable museum isolates sections of the bigger museum. Comments are closed. | ArchivesPages
| ||||
Copyright © 2011ColoriumLaboratorium - All Rights Reserved | ||||||