Print

Print


The best way to handle them depends on what you want to do. You need to
actually download the NAF files rather than countries or other small files
as different kinds of data will be organized differently. Just don't try to
read multigigabyte files in a text editor :)

If you start with one of the giant XML files, the first thing you'll
probably want to do is extract just the elements that are interesting to
you. A short string parsing or SAX routine in your language of choice
should let you get the information in a format you like.

If you download the linked data files and you're interested in actual
headings (as opposed to traversing relationships), grep and sed in
combination with the join utility are handy for extracting the elements you
want and flattening the relationships into something more convenient to
work with. But there are plenty of other tools that you could also use.

If you don't already have a convenient environment to work on, I'm a  fan
of virtualbox. You can drag and drop things into and out of your regular
desktop or even access it directly. That way you can view/manipulate files
with the linux utilities without having to deal with a bunch of clunky file
transfer operations involving another machine. Very handy for when you have
to deal with multigigabyte files.

kyle

On Mon, Sep 29, 2014 at 11:19 AM, Jean Roth <[log in to unmask]> wrote:

> Thank you!  It looks like the files are available as  RDF/XML, Turtle, or
> N-triples files.
>
> Any examples or suggestions for reading any of these formats?
>
> The MARC Countries file is small, 31-79 kb.  I assume a script that
> would read a small file like that would at least be a start for the LCNAF
>
>