Hi Jean, > So, I was wondering if any one had an example Python or Perl script for > reading RDF/XML, Turtle, or N-triples file. A simple/partial example > would be fine. I worked on a Perl script for reading RDF during last year's OCLC Developer House event. I used the Perl "RDF::Helper" module since it claimed to "Provide a consistent, high-level API for working with RDF with Perl" [1]. There was a bit of a learning curve and I was not able to find much in the way of RDF::Helper code examples on the interwebs. For the OCLC Developer House project, we were extracting, parsing, and displaying a library's hours from institutional data in the OCLC WorldCat Registry [2]. I've attached a perl "proof-of-concept" script and a couple of screen shots showing output. The script file has an additional ".txt" file extension for safe travels thru email. The script requires non-core perl module(s), as well as specifying a path to a CA Root certs file (for HTTPS gets). The other Developer House "Registry Hours" project team members worked on a PHP script to do essentially the same thing (although more elegantly and with more functionality). Their code is available on Github [3]. Good luck! - Michael Doran [1] http://search.cpan.org/dist/RDF-Helper/ [2] Examples of data for UTA Libraries: https://worldcat.org/wcr/normal-hours/data/2928 https://worldcat.org/wcr/special-hours/data/2928 [3] https://github.com/oclc-developer-house/wclibhours # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # [log in to unmask] # http://rocky.uta.edu/doran/ > -----Original Message----- > From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of > Jean Roth > Sent: Tuesday, September 30, 2014 9:14 AM > To: [log in to unmask] > Subject: [CODE4LIB] Python or Perl script for reading RDF/XML, Turtle, or > N-triples Files > > Thank you so much for the reply. > > I have not investigated the LCNAF data set thoroughly. However, my > default/ideal is to read in all variables from a dataset. > > So, I was wondering if any one had an example Python or Perl script for > reading RDF/XML, Turtle, or N-triples file. A simple/partial example > would be fine. > > Thanks, > > Jean > > On Mon, 29 Sep 2014, Kyle Banerjee wrote: > > KB> The best way to handle them depends on what you want to do. You need > to > KB> actually download the NAF files rather than countries or other small > files > KB> as different kinds of data will be organized differently. Just don't > try to > KB> read multigigabyte files in a text editor :) > KB> > KB> If you start with one of the giant XML files, the first thing you'll > KB> probably want to do is extract just the elements that are interesting > to > KB> you. A short string parsing or SAX routine in your language of choice > KB> should let you get the information in a format you like. > KB> > KB> If you download the linked data files and you're interested in actual > KB> headings (as opposed to traversing relationships), grep and sed in > KB> combination with the join utility are handy for extracting the > elements you > KB> want and flattening the relationships into something more convenient > to > KB> work with. But there are plenty of other tools that you could also > use. > KB> > KB> If you don't already have a convenient environment to work on, I'm a > fan > KB> of virtualbox. You can drag and drop things into and out of your > regular > KB> desktop or even access it directly. That way you can view/manipulate > files > KB> with the linux utilities without having to deal with a bunch of > clunky file > KB> transfer operations involving another machine. Very handy for when > you have > KB> to deal with multigigabyte files. > KB> > KB> kyle > KB> > KB> On Mon, Sep 29, 2014 at 11:19 AM, Jean Roth <[log in to unmask]> wrote: > KB> > KB> > Thank you! It looks like the files are available as RDF/XML, > Turtle, or > KB> > N-triples files. > KB> > > KB> > Any examples or suggestions for reading any of these formats? > KB> > > KB> > The MARC Countries file is small, 31-79 kb. I assume a script that > KB> > would read a small file like that would at least be a start for the > LCNAF > KB> > > KB> > > KB>