Two alternative workarounds could be:
David Walker's RSS Creator:
If you have any sort of metasearch engine you might be able to do something
similar (or if not, you could try SFU's dbWiz2 or, when they release it,
Or CiteULike's TOC Service:
You may always run into problems with the DOM approach for the reasons Andy
stated, but if you used a library like Beautiful/Rubyful Soup, your success
rate at scraping the sites might improve.
On 5/30/06, Houghton,Andrew <[log in to unmask]> wrote:
> > From: Code for Libraries [mailto:[log in to unmask]] On
> > Behalf Of Mang Sun
> > Sent: 30 May, 2006 09:32
> > To: [log in to unmask]
> > Subject: [CODE4LIB] Monthly newsletter of Table of Content?
> > We don' like to manually copy and paste TOCs from Journals'
> > sites into a webpage. However, most of them don't provide RSS
> > while some provide newsletter services.
> > We are wondering what kinds of techniques could be handy?
> > DOM,SAS ? Or is there any automation tool for this task? Some
> > different ideas?
> I think you will need to use a number of techniques. Using an
> XML DOM will probably work the least. Many Web sites put out
> HTML, that many times is not valid and the few sites that do
> put out XHTML may also not be valid XML. You may find some
> commonality with the sites you are looking at, but it's hard
> to say without looking at your list of sites.
> What would be nice to have, would be an OpenURL 1.0 service that
> returned an RSS feed for the Journals you are looking at providing
> a TOC to. That way anyone could use your OpenURL service to get
> an RSS feed of the Journal's TOC.
> I proposed a service like the one you are trying to build to our
> FirstSearch development group in November of 2003. They were in
> a better position to construct the service since many times they
> receive the journal's TOC from the publisher in electronic form.
> Unfortunately, there were other business priorities at the time
> and the proposal met a silent death. The following was from the
> "Business Problem:
> The local library staff wants to provide a service to their patrons
> to push journal article information to them. Here is an overview
> of how the process would work. A library patron subscribes, in
> their RSS aggregator, to an RSS document for a journal, e.g.
> Technology Review, on their local library portal. Each period,
> e.g. week or month, the RSS document contains the table of contents
> (TOC) entries for the journal. The patron clicks on a journal
> article link that allows them to view the article's full text
> through the local library's portal."
> Implicit in the statement: "The patron clicks on a journal article
> link that allows them to view the article's full text through the
> local library's portal." is a link to an OpenURL resolver. However,
> an OpenURL 1.0 service could also be used to serve the RSS document
> to the library patron.
> Andrew Houghton, OCLC Online Computer Library Center, Inc.