Thanks for the tips. I got it working with an XSLT stylesheet, which I have
attached for those who are interested.
You can generate a test xml file with the command line:
java -jar /path/to/saxon9.jar -s
http://id.loc.gov/authorities/feed/page/1/-xsl:/path/to/update-lcsh.xsl
date=2010-04-15 > test.xml
where date=2010-04-15 is a parameter that you can change. It is passed to
the stylesheet and Saxon steps through the pages of the feed, extracting
those entries created after 2010-04-14. I found it to be pretty fast. I
can easily integrate this into an Orbeon pipeline to keep my Solr index of
LCSH terms up to date.
Ethan
On Fri, May 14, 2010 at 10:45 AM, Kevin Ford <[log in to unmask]> wrote:
> Hard-coded. There's currently no way to pass a type of "count" parameter.
>
> Cordially,
> Kevin
>
>
> >>> Ethan Gruber <[log in to unmask]> 05/14/10 9:58 AM >>>
> Thanks for the help. It should be doable. Do you know if it's possible to
> control the number of entries per page, or is that locked?
>
> Ethan
>
> On Thu, May 13, 2010 at 6:11 PM, Ed Summers <[log in to unmask]> wrote:
>
> > As Kevin said, I think you can use the Atom feed to page backwards
> > through time. Basically this amounts to programatically following the
> > <l!nk rel="next"> links in the feed, applying creates, updates and
> > deletes as you go until you make it to Feb. 15, 2010.
> >
> > Currently this would involve walking from:
> >
> > http://id.loc.gov/authorities/feed/
> >
> > to:
> >
> > http://id.loc.gov/authorities/feed/page/2/
> >
> > all the way to:
> >
> > http://id.loc.gov/authorities/feed/page/96/
> >
> > Then in a months time or whatever you can run the same process again.
> > I think you can either walk through the feed pages until a known last
> > harvest date, or until you see a record with an atom:id and
> > atom:update you already know about. I think the latter could be a bit
> > simpler, assuming you are keeping track of what you have.
> >
> > Ever since reading the OAI-ORE specs on Atom [1] I've become a bit
> > taken with the idea of using Atom syndication as a drop in replacement
> > for OAI-PMH--which is the spec that most people in the library
> > community reach for when they want to do metadata synchronization. The
> > advantage of Atom is that it fits into the syndication world so
> > nicely, and its ecosystem of tools and services.
> >
> > //Ed
> >
> > [1] http://www.openarchives.org/ore/1.0/atom
> >
> >
> > On Thu, May 13, 2010 at 4:53 PM, Kevin Ford <[log in to unmask]> wrote:
> > > The short answer to your question is "no," there's no way to query
> terms
> > based on last modification date. However, and this feature needs
> > publication on the website, there is an Atom feed that exposes the change
> > activities for the subject headings:
> > >
> > > http://id.loc.gov/authorities/feed/
> > >
> > > You can page through it (feed/page/1, feed/page/2).
> > >
> > > There is also a page that shows when each load was performed:
> > >
> > > http://id.loc.gov/authorities/loads/
> > >
> > > It too has an Atom feed (http://id.loc.gov/authorities/loads/feed).
> > >
> > > HTH,
> > > Kevin
> >
>
|