Thanks for the tips. I got it working with an XSLT stylesheet, which I have attached for those who are interested. You can generate a test xml file with the command line: java -jar /path/to/saxon9.jar -s http://id.loc.gov/authorities/feed/page/1/-xsl:/path/to/update-lcsh.xsl date=2010-04-15 > test.xml where date=2010-04-15 is a parameter that you can change. It is passed to the stylesheet and Saxon steps through the pages of the feed, extracting those entries created after 2010-04-14. I found it to be pretty fast. I can easily integrate this into an Orbeon pipeline to keep my Solr index of LCSH terms up to date. Ethan On Fri, May 14, 2010 at 10:45 AM, Kevin Ford <[log in to unmask]> wrote: > Hard-coded. There's currently no way to pass a type of "count" parameter. > > Cordially, > Kevin > > > >>> Ethan Gruber <[log in to unmask]> 05/14/10 9:58 AM >>> > Thanks for the help. It should be doable. Do you know if it's possible to > control the number of entries per page, or is that locked? > > Ethan > > On Thu, May 13, 2010 at 6:11 PM, Ed Summers <[log in to unmask]> wrote: > > > As Kevin said, I think you can use the Atom feed to page backwards > > through time. Basically this amounts to programatically following the > > <l!nk rel="next"> links in the feed, applying creates, updates and > > deletes as you go until you make it to Feb. 15, 2010. > > > > Currently this would involve walking from: > > > > http://id.loc.gov/authorities/feed/ > > > > to: > > > > http://id.loc.gov/authorities/feed/page/2/ > > > > all the way to: > > > > http://id.loc.gov/authorities/feed/page/96/ > > > > Then in a months time or whatever you can run the same process again. > > I think you can either walk through the feed pages until a known last > > harvest date, or until you see a record with an atom:id and > > atom:update you already know about. I think the latter could be a bit > > simpler, assuming you are keeping track of what you have. > > > > Ever since reading the OAI-ORE specs on Atom [1] I've become a bit > > taken with the idea of using Atom syndication as a drop in replacement > > for OAI-PMH--which is the spec that most people in the library > > community reach for when they want to do metadata synchronization. The > > advantage of Atom is that it fits into the syndication world so > > nicely, and its ecosystem of tools and services. > > > > //Ed > > > > [1] http://www.openarchives.org/ore/1.0/atom > > > > > > On Thu, May 13, 2010 at 4:53 PM, Kevin Ford <[log in to unmask]> wrote: > > > The short answer to your question is "no," there's no way to query > terms > > based on last modification date. However, and this feature needs > > publication on the website, there is an Atom feed that exposes the change > > activities for the subject headings: > > > > > > http://id.loc.gov/authorities/feed/ > > > > > > You can page through it (feed/page/1, feed/page/2). > > > > > > There is also a page that shows when each load was performed: > > > > > > http://id.loc.gov/authorities/loads/ > > > > > > It too has an Atom feed (http://id.loc.gov/authorities/loads/feed). > > > > > > HTH, > > > Kevin > > >