Hi Jean,
I've found rdflib (https://github.com/RDFLib/rdflib) on the Python side exceeding simple to work with and use. For example, to load the current BIBFRAME vocabulary as an RDF graph using a Python shell:
>> import rdflib
>> bf_vocab = rdflib.Graph().parse('http://bibframe.org/vocab/')
>> len(bf_vocab) # Total number of triples
1683
>> set([s for s in bf_vocab]) # A set of all unique subjects in the graph
This module offers RDF/XML, Turtle, or N-triples support and with various options for retrieving and manipulating the graph's subjects, predicate, and objects. I would advise installing the JSON-LD (https://github.com/RDFLib/rdflib-jsonld) extension as well.
Jeremy Nelson
Metadata and Systems Librarian
Colorado College
-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Jean Roth
Sent: Tuesday, September 30, 2014 8:14 AM
To: [log in to unmask]
Subject: [CODE4LIB] Python or Perl script for reading RDF/XML, Turtle, or N-triples Files
Thank you so much for the reply.
I have not investigated the LCNAF data set thoroughly. However, my default/ideal is to read in all variables from a dataset.
So, I was wondering if any one had an example Python or Perl script for reading RDF/XML, Turtle, or N-triples file. A simple/partial example would be fine.
Thanks,
Jean
On Mon, 29 Sep 2014, Kyle Banerjee wrote:
KB> The best way to handle them depends on what you want to do. You need
KB> to actually download the NAF files rather than countries or other
KB> small files as different kinds of data will be organized
KB> differently. Just don't try to read multigigabyte files in a text
KB> editor :)
KB>
KB> If you start with one of the giant XML files, the first thing you'll
KB> probably want to do is extract just the elements that are
KB> interesting to you. A short string parsing or SAX routine in your
KB> language of choice should let you get the information in a format you like.
KB>
KB> If you download the linked data files and you're interested in
KB> actual headings (as opposed to traversing relationships), grep and
KB> sed in combination with the join utility are handy for extracting
KB> the elements you want and flattening the relationships into
KB> something more convenient to work with. But there are plenty of other tools that you could also use.
KB>
KB> If you don't already have a convenient environment to work on, I'm a
KB> fan of virtualbox. You can drag and drop things into and out of your
KB> regular desktop or even access it directly. That way you can
KB> view/manipulate files with the linux utilities without having to
KB> deal with a bunch of clunky file transfer operations involving
KB> another machine. Very handy for when you have to deal with multigigabyte files.
KB>
KB> kyle
KB>
KB> On Mon, Sep 29, 2014 at 11:19 AM, Jean Roth <[log in to unmask]> wrote:
KB>
KB> > Thank you! It looks like the files are available as RDF/XML,
KB> > Turtle, or N-triples files.
KB> >
KB> > Any examples or suggestions for reading any of these formats?
KB> >
KB> > The MARC Countries file is small, 31-79 kb. I assume a script
KB> > that would read a small file like that would at least be a start
KB> > for the LCNAF
KB> >
KB> >
KB>
|