LISTSERV 16.5 - CODE4LIB Archives

Hi Jean,

> So, I was wondering if any one had an example Python or Perl script for
> reading RDF/XML, Turtle, or N-triples file.  A simple/partial example
> would be fine.

I worked on a Perl script for reading RDF during last year's OCLC Developer House event. 

I used the Perl "RDF::Helper" module since it claimed to "Provide a consistent, high-level API for working with RDF with Perl" [1].  There was a bit of a learning curve and I was not able to find much in the way of RDF::Helper code examples on the interwebs.

For the OCLC Developer House project, we were extracting, parsing, and displaying a library's hours from institutional data in the OCLC WorldCat Registry [2].  I've attached a perl "proof-of-concept" script and a couple of screen shots showing output.  The script file has an additional ".txt" file extension for safe travels thru email.  The script requires non-core perl module(s), as well as specifying a path to a CA Root certs file (for HTTPS gets).

The other Developer House "Registry Hours" project team members worked on a PHP script to do essentially the same thing (although more elegantly and with more functionality).  Their code is available on Github [3].

Good luck!

- Michael Doran

[1] http://search.cpan.org/dist/RDF-Helper/

[2] Examples of data for UTA Libraries:
    https://worldcat.org/wcr/normal-hours/data/2928
    https://worldcat.org/wcr/special-hours/data/2928

[3] https://github.com/oclc-developer-house/wclibhours

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [log in to unmask]
# http://rocky.uta.edu/doran/

> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Jean Roth
> Sent: Tuesday, September 30, 2014 9:14 AM
> To: [log in to unmask]
> Subject: [CODE4LIB] Python or Perl script for reading RDF/XML, Turtle, or
> N-triples Files
> 
> Thank you so much for the reply.
> 
> I have not investigated the LCNAF data set thoroughly.  However, my
> default/ideal is to read in all variables from a dataset.
> 
> So, I was wondering if any one had an example Python or Perl script for
> reading RDF/XML, Turtle, or N-triples file.  A simple/partial example
> would be fine.
> 
> Thanks,
> 
> Jean
> 
> On Mon, 29 Sep 2014, Kyle Banerjee wrote:
> 
> KB> The best way to handle them depends on what you want to do. You need
> to
> KB> actually download the NAF files rather than countries or other small
> files
> KB> as different kinds of data will be organized differently. Just don't
> try to
> KB> read multigigabyte files in a text editor :)
> KB>
> KB> If you start with one of the giant XML files, the first thing you'll
> KB> probably want to do is extract just the elements that are interesting
> to
> KB> you. A short string parsing or SAX routine in your language of choice
> KB> should let you get the information in a format you like.
> KB>
> KB> If you download the linked data files and you're interested in actual
> KB> headings (as opposed to traversing relationships), grep and sed in
> KB> combination with the join utility are handy for extracting the
> elements you
> KB> want and flattening the relationships into something more convenient
> to
> KB> work with. But there are plenty of other tools that you could also
> use.
> KB>
> KB> If you don't already have a convenient environment to work on, I'm a
> fan
> KB> of virtualbox. You can drag and drop things into and out of your
> regular
> KB> desktop or even access it directly. That way you can view/manipulate
> files
> KB> with the linux utilities without having to deal with a bunch of
> clunky file
> KB> transfer operations involving another machine. Very handy for when
> you have
> KB> to deal with multigigabyte files.
> KB>
> KB> kyle
> KB>
> KB> On Mon, Sep 29, 2014 at 11:19 AM, Jean Roth <[log in to unmask]> wrote:
> KB>
> KB> > Thank you!  It looks like the files are available as  RDF/XML,
> Turtle, or
> KB> > N-triples files.
> KB> >
> KB> > Any examples or suggestions for reading any of these formats?
> KB> >
> KB> > The MARC Countries file is small, 31-79 kb.  I assume a script that
> KB> > would read a small file like that would at least be a start for the
> LCNAF
> KB> >
> KB> >
> KB>