Also: James Eric and Erin Vogel cover Wilco's Yankee Hotel Foxtrot.
| Jun | JUL | Aug |
| 27 | ||
| 2010 | 2011 | 2012 |
Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi.
What?s in the data set:
Crawl start date: 09 March, 2011
Crawl end date: 23 December, 2011
Number of captures: 2,713,676,341
Number of unique URLs: 2,273,840,159
Number of hosts: 29,032,069
The seed list for this crawl was a list of Alexa?s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT) crawler software and respected robots.txt directives. The scope of the crawl was not limited except for a few manually excluded sites.
However this was a somewhat experimental crawl for us, as we were using newly minted software to feed URLs to the crawlers, and we know there were some operational issues with it. For example, in many cases we may not have crawled all of the embedded and linked objects in a page since the URLs for these resources were added into queues that quickly grew bigger than the intended size of the crawl (and therefore we never got to them). We also included repeated crawls of some Argentinian government sites, so looking at results by country will be somewhat skewed.
We have made many changes to how we do these wide crawls since this particular example, but we wanted to make the data available ?warts and all? for people to experiment with. We have also done some further analysis of the content.
If you would like access to this set of crawl data, please contact us at info at archive dot org and let us know who you are and what you?re hoping to do with it. We may not be able to say ?yes? to all requests, since we?re just figuring out whether this is a good idea, but everyone will be considered.

An MP3 blog dedicated to cover songs, good and bad.
liza.copyright @ gmail.com
See my helpful list of Disclaimers and Reminders after the Blog Roll and Other Links.
1. MP3s on this blog are available for a short time and are here for sampling purposes only. My only goal is to share my love/hate relationship with cover songs and to turn readers on to artists of whom they might not have otherwise been aware. If you are the creator or copyright owner of a song or anything else that might be posted here, please CONTACT ME by sending an e-mail (see above) if you wish to have it removed.
2. PLEASE don't direct link to any individual tracks. If you like an artist you hear here (either the original artist or the one performing the cover), you should definitely do your best to take the knowledge you gain here and go out and BUY THEIR ALBUMS.
3. I very rarely, if ever, repost songs. If you missed something, I apologize, but I get lots of e-mails from people asking me to repost things. If I actually reposted all the tracks people wanted me to, I'd never get a chance to post anything new. That said, if there is A TRACK OR TWO you feel you desperately need, feel free to drop me a line at liza.copyright @ gmail.com (sans spaces) and I'll do my best to get it to you. But please keep in mind that it might not be instantaneous, as I get approximately 600 e-mails a week.
4. If you read my updates via an RSS feed aggregator or via a LiveJournal feed, there will be times when old posts will suddenly appear with non-working links, which happens as a result of my going back to an older entry to remove the links or to update or edit the post in any way. Again, I apologize if you missed these posts initially, but I cannot repost the songs. Lately I've been keeping songs up for two weeks to a month and I feel that should be plenty of time for you to get the songs. If you don't read your feeds often and regularly miss tracks, I suggest you bookmark the site and check it once a week or so.
5. I do not generally take requests nor do I guarantee that songs submitted to me will appear here. You're welcome to send me a track or two, but it might not hit the blog. Also, I get LOTS of e-mail so submissions sent via sites like YouSendIt or Dropload.com often expire by the time I get to them. Please attach your files to the actual e-mail if you want to ensure I get it. Similarly, I'd really appreciate it if those of you sending MP3s to my inbox would check to see whether or not I've already posted the tracks first. There is a search bar in the upper left corner of the site that should assist you in said endeavor. Granted, chances are good that even if I've not posted it, I already have it, but at least there's a chance I don't.
6. If you're looking for a few suggestions on beginning your own cover hunts, here is an old entry that might prove helpful.
7. If you are a member or agent of a band, please keep in mind that I ONLY POST COVER SONGS. Similarly, I do not "review" albums. I will most likely delete and ignore any communication from you that shows you do not understand this.
Thanks for listening,
Liza.
