I have written a couple of blog postings as well as bunches o' hacks surrounding VUFind, EAD files, harvesting content, and text mining that may be of interest to us coders:
1. EAD files - The first posting and set of Perl scripts describes how I am currently indexing MARC records, but more importantly, EAD files in VUFind. The process involves harvesting EAD files from remote locations, transforming them into HTML, indexing them at the container level, and providing access to the index. [1, 2]
2. Internet Archive content - The second posting describes how I mirrored content from the Internet archive, munged the mirrored MARC records, indexed them, and provided a rudimentary text mining interface against the locally cached full text. [3, 4]
There are lots of cool (as well as "kewl") possibilities here.
[1] indexing EAD in VUFind - http://bit.ly/cIu0lG
[2] EAD record in VUFind - http://bit.ly/9Z7GUg
[3] Internet Archive content - http://bit.ly/dbzYyX
[4] harvested record with text mining - http://bit.ly/ahjLf2
--
Eric "@isitfriday" Morgan
|