LISTSERV 16.5 - CODE4LIB Archives

Have a link to your server?

Hopefully your system has a pretty flexible feed system and lets you specify date ranges or the like.

Then just do something like
wget url_to_rss.
That will create the rss file.

Similarly, lynx -dump will do the same.

(You'll want to be careful about overwriting files).

The other thing you may have to do is crawl/filter your website to get the links to each post, then request the rss2 version of that.

I don't have any actual samples now, but I might be able to give you some after work.

(I can create a hypothetical scenario though...let's say there's a well know blog system that always will give you an rss version of a document by merely adding feed to the end of the url.  It also contains links on the front page to monthly archives.

In a Unixish environment I  might do something like
lynx -dump | grep http | sed -e 's/^[^h]*h/h/' -e 's/ *$/\/feed/' > temp.txt
(edit temp to include just the monthly archives)
then  cat temp.txt | xargs wget -r -l 1

Of course, without knowing how much you can actually get out of your system as rss it's a gamble.

(Sorry, this all isn't probably very helpful, but if you give your actual url I might be able to give something more meaningful tonight.)

Jon Gorman


---- Original message ----
>Date: Tue, 6 May 2008 11:23:29 -0400
>From: "The Ford Library at Fuqua" <[log in to unmask]>
>Subject: Re: [CODE4LIB] Exporting RSS Source from a Blog
>To: [log in to unmask]
>Cc: [log in to unmask]
>
>   Hi John,
>
>   Thanks for the quick response. I tried accessing the
>   feed with lynx to no avail. Its been quite awhile
>   since I worked w/ lynx. I'll take a quick look at
>   wget as well and see if its deployed here and
>   usable.
>
>   Can you spare a few moments to send an example of
>   your "quick and dirty" method?
>
>   Feel free to do this on- or off-list if you have the
>   time.
>   Thanks again!
>   --
>   Carlton Brown
>   Associate Director & IT Services Manager
>   Ford Library - Fuqua School of Business
>   Duke University
>
>   On Tue, May 6, 2008 at 10:11 AM, Jonathan Gorman
>   <[log in to unmask]> wrote:
>
>     The quick and dirty way I've done something
>     similar in the past is to download individual rss
>     pages by running something like wget. Other
>     command-line browsers/spiders could do something
>     similar.
>
>     After all, the mechanisms for pulling rss feeds
>     are really at the base the same mechanisms for
>     pulling web pages of any type.
>
>     Jon Gorman
>     ---- Original message ----
>     >Date: Tue, 6 May 2008 10:01:48 -0400
>     >From: The Ford Library at Fuqua
>     <[log in to unmask]>
>     >Subject: [CODE4LIB] Exporting RSS Source from a
>     Blog
>     >To: [log in to unmask]
>     >
>     > Hello All,
>     >
>     >We're attempting to migrate our java-based
>     Blojsom blog to the more
>     >user-friendly WordPress software. WordPress has
>     built import wizards for
>     >many popular blog platforms; but there isn't one
>     for Blojsom which is
>     >different from *bloxsom* which does have an
>     import wizard. Blojsom does have
>     >an export blog plugin; but the data is not in RSS
>     2.0 and would require more
>     >Perl than I know to convert.
>     >
>     >WP can import data in RSS 2.0, and I can grab the
>     RSS source of some posts
>     >by simply viewing/copying the source in my
>     browser. But I need to migrate
>     >more than the limited number of posts that can be
>     extracted by viewing the
>     >RSS source in the browser.
>     >
>     >Does anyone know of a tool or hack to extract -
>     export the entire contents,
>     >or a large fixed number of posts from a blog as
>     RSS 2.0? Google Reader and
>     >some others will grab a large number of posts;
>     but I can't view the RSS
>     >source.
>     >
>     >I've done considerable googling already and the
>     few scripts/tools I've
>     >located call for PHP or Ruby -- neither of which
>     are deployed in our
>     >environment.
>     >
>     >Thanks in advance for any tips or pointers.
>     >
>     >--
>     >Carlton Brown
>     >Associate Director & IT Services Manager
>     >Ford Library - Fuqua School of Business
>     >Duke University