At first I was wondering why you were complaining about the lack of
metadata, since you were providing full-text searching, but then when I
saw the search results I felt your pain. Without metadata, the document
titles are often a very poor substitute. But I'm wondering if we still
couldn't do it such a way to better organize results, such as by
journal. For example, I did a somewhat related project a while back
where I crawled all the UC web sites (mostly to prove that it could be
done to skeptics), and provided a staged set of results. First the user
gets back which sites had hits (along with a count of the number of
matching pages), then when they click on a web site they see all the
documents from that site. To see this in action, go to
<http://sunsite.berkeley.edu/uclibs/> (but keep in mind it is a
long-dead index with lots of 404s).
There may be other ways to leverage more information out of what we're
indexing. For example, a number of journals have sections, such as "In
Brief" from D-Lib Magazine and "NEWS FROM OTHER JOURNALS SECTION" from
Libres. SWISH-E is designed to be able to take documents to be indexed
from a Perl program. I could envision a relatively simple
infrastructure that would take information from a database or set of
profiles that specified when indexing a particular journal, if it ran
into "In Brief", split it on the <div> tag and index the resulting
pieces as separate files, treating the contents of the <h3> tag as the
<title>. This would vastly improve search, retrieval, and results
display. It would of course take more work to both setup and maintain,
but the result would be better.
Roy
On Mar 2, 2004, at 5:01 AM, Eric Lease Morgan wrote:
> Eric wrote:
>
>> What do y'all think of this idea, a full-text index to the content of
>> open access journals?
>>
>> The phrase "open access journals" seems to be gaining popularity to
>> denote freely available scholarly journal content. There is a
>> directory
>> of such content in the Directory of Open Access Journals:
>>
>> http://www.doaj.org/
>>
>> What if someone, like us, were to mirror and/or crawl the content of
>> these open access journals and index the content. Wouldn't that begin
>> to demonstrate to the scholarly community that if they publish in
>> these
>> titles, then access to them will be assured? Wouldn't such a project
>> increase access to these titles, and help improve scholarly
>> communication?
>
> In my copious spare time I hacked together an application indexing
> selected library-related electronic serials from the Directory of Open
> Access Journals. From the half-baked system's About statement:
>
> DOAJ Index is YAMSI (Yet Another Mr. Serials
> Implementation). It's goal is to create a full-text index
> to (library related) scholarly literature in order to
> facilitate scholarly communication. I believe that if the
> library profession demonstrates the ability to collect,
> organize, archive, index, and provide access to freely
> available scholarly materials, then freely available
> scholarly materials will more likely to available. (This
> may very well be a circular argument.)
>
> http://dewey.library.nd.edu/morgan/doaji/?cmd=about
>
> The About statement goes on to briefly describe how the system is glued
> together, and how it can be improved.
>
> What would be really cool would be an full-text index all open access
> journal literature. Unfortunately, the metadata in these titles, even
> the library-related ones, is not very strong. This lack of metadata
> also makes the search interface rather weak. GIGO.
>
> --
> Eric Lease Morgan
> Head, Digital Access and Information Architecture Department
> University Libraries of Notre Dame
>
> (574) 631-8604
>
|