Print

Print


If you DO have to deal with raw MARC data,

I think you will find that the BT/NT relationships you are hoping to use
are _much_ more missing/contradictory than you hope, I'm afraid.  See Ed
Summer's post at:
http://inkdroid.org/journal/2008/01/23/lcsh-thesauri-and-skos/

People trying to do similar research have often chosen to use Dewey
Decimal Classification instead of LCSH, precisely because the hierarchal
relationships in DDC are so much better (and conveniently apparent
directly from the DDC number, without need to consult authority files).
Others (see NCSU Endeca implementation) have used Library of Congress
Classification---it has slightly better hiearchy than LCSH, but not
nearly as good as DDC and not apparent from the LCC number itself. In
general, I think you are going to find the 'controlled vocabularies' in
use here are not as good as you wish they were for these purposes.

(You may also be interested in this article in the premier issue of the
Code4Lib Journal: http://journal.code4lib.org/articles/23 ;  in
particular, see http://journal.code4lib.org/articles/23#problem2 ;
http://journal.code4lib.org/articles/23#problem5 ;
http://journal.code4lib.org/articles/23#problem7 ;
http://journal.code4lib.org/articles/23#problem9 )

If you do find yourself working with MARC, you may likely find it
convenient to first turn it into MODS, which is an XML format based on
MARC, but 'normalizing' some of the things you will want to normalize
for you. You should be able to find existing software to convert MARC to
MODS for you.

Charles Antoine Julien, Mr wrote:
> A kind fellow on NGC4Lib suggested I mention this here.
>
>
>
> I'm developing a 3D "fly-through" interface for an LCSH organized
> collection but I'm having difficulty finding a library willing to "give"
> me a subset of their data (i.e., subject headings (broad to narrow
> terms) and the bib records to which they have been assigned).  They just
> don't see why they should help me.  Their value added isn't clear to
> them since this is experimental and I have no wish turn this into a
> business (I like to build and test solutions...selling them isn't my
> piece of pie).
>
>
>
> I'm planning to import the data into Access or SQL Server (depending how
> much I get) and partly normalize the bib records so subject terms for
> each item are in a separate one-to-many table.  I also need the
> authority data to establish where each subject term (and its associated
> bib records) resides in the broad to narrow term hierarchy...this is
> more useful in the sciences which seems to have 4-6 levels deep.
>
>
>
> Jonathan Rochkind (kind fellow in question) suggested the following
>
>
>
> -I could access data directly through Z39.5...
>
> -I could "take" LC subject authority data in MARC format from a certain
> grey-area-legal source
>
> -I could take bib records (and their associated LCSH terms) from
>
> http://simile.mit.edu/wiki/Dataset_Collection Particularly:
> http://simile.mit.edu/rdf-test-data/barton/compressed/
>
> In particular, the "Barton" collection. That will be in the MODS format,
> which will actually be easier to work with than library standard MARC.
>
> Or http://www.archive.org/details/marc_records_scriblio_net
>
>
>
> Obviously I'm not looking forward to parsing MARC data although I've
> heard there are scripts for this.
>
>
>
> Additional suggestions and/or comments would be greatly appreciated.
>
>
>
> Thanks a bunch,
>
>
>
> Charles-Antoine Julien
>
> Ph.D Candidate
>
> School of Information Studies
>
> McGill University
>
>
>
>

--
Jonathan Rochkind
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu