LISTSERV 16.5 - CODE4LIB Archives

Hi Jean,
I've found rdflib (https://github.com/RDFLib/rdflib) on the Python side exceeding simple to work with and use. For example, to load the current BIBFRAME vocabulary as an RDF graph using a Python shell:

>> import rdflib
>> bf_vocab = rdflib.Graph().parse('http://bibframe.org/vocab/')
>> len(bf_vocab) # Total number of triples
1683
>> set([s for s in bf_vocab]) # A set of all unique subjects in the graph


This module offers RDF/XML, Turtle, or N-triples support and with various options for retrieving and manipulating the graph's subjects, predicate, and objects. I would advise installing the JSON-LD (https://github.com/RDFLib/rdflib-jsonld) extension as well.

Jeremy Nelson
Metadata and Systems Librarian
Colorado College

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Jean Roth
Sent: Tuesday, September 30, 2014 8:14 AM
To: [log in to unmask]
Subject: [CODE4LIB] Python or Perl script for reading RDF/XML, Turtle, or N-triples Files

Thank you so much for the reply.

I have not investigated the LCNAF data set thoroughly.  However, my default/ideal is to read in all variables from a dataset.  

So, I was wondering if any one had an example Python or Perl script for reading RDF/XML, Turtle, or N-triples file.  A simple/partial example would be fine.

Thanks,

Jean

On Mon, 29 Sep 2014, Kyle Banerjee wrote:

KB> The best way to handle them depends on what you want to do. You need 
KB> to actually download the NAF files rather than countries or other 
KB> small files as different kinds of data will be organized 
KB> differently. Just don't try to read multigigabyte files in a text 
KB> editor :)
KB> 
KB> If you start with one of the giant XML files, the first thing you'll 
KB> probably want to do is extract just the elements that are 
KB> interesting to you. A short string parsing or SAX routine in your 
KB> language of choice should let you get the information in a format you like.
KB> 
KB> If you download the linked data files and you're interested in 
KB> actual headings (as opposed to traversing relationships), grep and 
KB> sed in combination with the join utility are handy for extracting 
KB> the elements you want and flattening the relationships into 
KB> something more convenient to work with. But there are plenty of other tools that you could also use.
KB> 
KB> If you don't already have a convenient environment to work on, I'm a  
KB> fan of virtualbox. You can drag and drop things into and out of your 
KB> regular desktop or even access it directly. That way you can 
KB> view/manipulate files with the linux utilities without having to 
KB> deal with a bunch of clunky file transfer operations involving 
KB> another machine. Very handy for when you have to deal with multigigabyte files.
KB> 
KB> kyle
KB> 
KB> On Mon, Sep 29, 2014 at 11:19 AM, Jean Roth <[log in to unmask]> wrote:
KB> 
KB> > Thank you!  It looks like the files are available as  RDF/XML, 
KB> > Turtle, or N-triples files.
KB> >
KB> > Any examples or suggestions for reading any of these formats?
KB> >
KB> > The MARC Countries file is small, 31-79 kb.  I assume a script 
KB> > that would read a small file like that would at least be a start 
KB> > for the LCNAF
KB> >
KB> >
KB>