Hi Mark, Actually, the sitemap.org protocol allows for a sitemap to include references to multiple child sitemaps http://www.sitemaps.org/protocol.html#index. Which is what we did at my former employer: http://digitalcollections.library.gsu.edu/sitemap/sitemap.xml And thus the robots.txt only includes a single sitemap: http://digitalcollections.library.gsu.edu/robots.txt When we add extra collections, it just goes into the sitemap.xml, so we are not continuously updating the robots.txt. Chad On Fri, Feb 1, 2013 at 11:33 AM, Sullivan, Mark V <[log in to unmask]>wrote: > Jason, > > You may want to allow people just to give you the robots.txt file which > references the sitemap. I also register the sitemaps individually with the > big search engines for our site, but I found that very large sitemaps > aren't processed very well. So, for our site I think I limited the number > of items per sitemap to 40,000. Which results in ten sitemaps for the > digital objects and an additional sitemap for all the collections. > > http://ufdc.ufl.edu/robots.txt > > Or else perhaps give more boxes, so we can include all the sitemaps > utilized in our systems. > > Cheers! > > Mark > > > Mark V Sullivan > Digital Development and Web Coordinator > Technology and Support Services > University of Florida Libraries > 352-273-2907 (office) > 352-682-9692 (mobile) > [log in to unmask] > > > > ________________________________________ > From: Code for Libraries [[log in to unmask]] on behalf of Jason > Ronallo [[log in to unmask]] > Sent: Friday, February 01, 2013 11:14 AM > To: [log in to unmask] > Subject: [CODE4LIB] digital collections sitemaps > > Hi, > > I've seen registries for digital collections that make their metadata > available through OAI-PMH, but I have yet to see a listing of digital > collections that just make their resources available on the Web the > way the Web works [1]. Sitemaps are the main mechanism for listing Web > resources for automated crawlers [2]. Knowing about all of these > various sitemaps could have many uses for research and improving the > discoverability of digital collections on the open Web [3]. > > So I thought I'd put up a quick form to start collecting digital > collections sitemaps. One required field for the sitemap itself. > Please take a few seconds to add any digital collections sitemaps you > know about--they don't necessarily have to be yours. > > > https://docs.google.com/spreadsheet/viewform?formkey=dE1JMDRIcXJMSzJ0YVlRaWdtVnhLcmc6MQ#gid=0 > > At this point I'll make the data available to anyone that asks for it. > > Thank you, > > Jason > > [1] At least I don't recall seeing such a sitemap registry site or > service. If you know of an existing registry of digital collections > sitemaps, please let me know about it! > [2] http://www.sitemaps.org/ For more information on robots see > http://wiki.code4lib.org/index.php/Robots_Are_Our_Friends > [3] For instance you can see how I've started to investigate whether > digital collections are being crawled by the Common Crawl: > http://jronallo.github.com/blog/common-crawl-url-index/ >