Thank you so much for the reply.
I have not investigated the LCNAF data set thoroughly. However, my
default/ideal is to read in all variables from a dataset.
So, I was wondering if any one had an example Python or Perl script for
reading RDF/XML, Turtle, or N-triples file. A simple/partial example
would be fine.
Thanks,
Jean
On Mon, 29 Sep 2014, Kyle Banerjee wrote:
KB> The best way to handle them depends on what you want to do. You need to
KB> actually download the NAF files rather than countries or other small files
KB> as different kinds of data will be organized differently. Just don't try to
KB> read multigigabyte files in a text editor :)
KB>
KB> If you start with one of the giant XML files, the first thing you'll
KB> probably want to do is extract just the elements that are interesting to
KB> you. A short string parsing or SAX routine in your language of choice
KB> should let you get the information in a format you like.
KB>
KB> If you download the linked data files and you're interested in actual
KB> headings (as opposed to traversing relationships), grep and sed in
KB> combination with the join utility are handy for extracting the elements you
KB> want and flattening the relationships into something more convenient to
KB> work with. But there are plenty of other tools that you could also use.
KB>
KB> If you don't already have a convenient environment to work on, I'm a fan
KB> of virtualbox. You can drag and drop things into and out of your regular
KB> desktop or even access it directly. That way you can view/manipulate files
KB> with the linux utilities without having to deal with a bunch of clunky file
KB> transfer operations involving another machine. Very handy for when you have
KB> to deal with multigigabyte files.
KB>
KB> kyle
KB>
KB> On Mon, Sep 29, 2014 at 11:19 AM, Jean Roth <[log in to unmask]> wrote:
KB>
KB> > Thank you! It looks like the files are available as RDF/XML, Turtle, or
KB> > N-triples files.
KB> >
KB> > Any examples or suggestions for reading any of these formats?
KB> >
KB> > The MARC Countries file is small, 31-79 kb. I assume a script that
KB> > would read a small file like that would at least be a start for the LCNAF
KB> >
KB> >
KB>
|