A year or so ago a couple of students looked into this for LibX. There
are a number of systems that people have published about, although
some are not available and none worked very well or were easy to get
to work. The systems also varied in their computational complexity,
with some not suitable for interactive use. Google for "libx citation
sensing", or generally for citation extraction, automatic record
boundary detection or extraction. (Unfortunately, pubs.dlib.vt.edu
appears to be down at the moment - otherwise, Suresh Menon's report
contains a useful bibliography of work. I'll ping them.)
For citations that contain item titles (which is true for a majority,
but definitely not all citation styles) LibX's magic button uses
Scholar as a hidden backend to produce an actionable OpenURL. Combined
with a similarity analysis, this "magic button" functionality
produces a usable OpenURL in (on average) 81% of cases for a set of
400 randomly chosen citations from 4 widely read journals from 4
different areas published in 2006 [1]. With some fixes, we could
probably get this number up to 90%. Obviously, this approach only
works for individual use, Google would object for large scale batch
uses.
- Godmar
[1] Annette Bailey and Godmar Back, Retrieving Known Items with LibX.
The Serials Librarian, 2007. To appear.
On 7/17/07, Jonathan Rochkind <[log in to unmask]> wrote:
> Does anyone have any decent open source code to parse a citation? I'm
> talking about a completely narrative citation like someone might
> cut-and-paste from a bibliography or web page. I realize there are a
> number of differnet formats this could be in (not to mention the human
> error problems that always occur from human entered free text)--but
> thinking about it, I suspect that with some work you could get something
> that worked reasonably well (if not perfect). So I'm wondering if anyone
> has donethis work.
>
> (One of the commerical legal product--I forget if it's Lexis or
> West--does this with legal citations--a more limited domain--quite
> well. I'm not sure if any of the commerical bibliographic citation
> management software does this?)
>
> The goal, as you can probably guess, is a box that the user can paste a
> citation into; make an OpenURL out of it; show the user where to get the
> citation. I'm pretty confident something useful could be created here,
> with enough time put into it. But saldy, it's probably more time than
> anyone has individually. Unless someone's done it already?
>
> Hopefully,
> Jonathan
>
|