Godmar Back wrote:
> A year or so ago a couple of students looked into this for LibX. There
> are a number of systems that people have published about, although
> some are not available and none worked very well or were easy to get
> to work. The systems also varied in their computational complexity,
> with some not suitable for interactive use. Google for "libx citation
> sensing", or generally for citation extraction, automatic record
> boundary detection or extraction. (Unfortunately, pubs.dlib.vt.edu
> appears to be down at the moment - otherwise, Suresh Menon's report
> contains a useful bibliography of work. I'll ping them.)
I've tested ParaTools
but after it choked on most of it's own examples, tried looking elsewhere.
Inera's eXtyles refXpress claims to do this. You can see it in action
at: <http://www.crossref.org/SimpleTextQuery/>. Better than ParaTools
but still missed a lot of things I thought would have been obvious.
Inera said most of the issues I picked out were a problem with
CrossRef's implementation, but the cost of the product was so great that
I didn't explore further.
There was an interesting paper at JCDL 2007 on an unsupervised way of
doing this that had promising results
<http://doi.acm.org/10.1145/1255175.1255219> but I haven't found any of
their code online.
> For citations that contain item titles (which is true for a majority,
> but definitely not all citation styles) LibX's magic button uses
> Scholar as a hidden backend to produce an actionable OpenURL. Combined
> with a similarity analysis, this "magic button" functionality
> produces a usable OpenURL in (on average) 81% of cases for a set of
> 400 randomly chosen citations from 4 widely read journals from 4
> different areas published in 2006 . With some fixes, we could
> probably get this number up to 90%. Obviously, this approach only
> works for individual use, Google would object for large scale batch
Agreed that a lookup against something like Google Scholar, Web of
Science, or a set of federated search targets instance may yield better
results. We've discussed by haven't done any testing.
> - Godmar
>  Annette Bailey and Godmar Back, Retrieving Known Items with LibX.
> The Serials Librarian, 2007. To appear.
> On 7/17/07, Jonathan Rochkind <[log in to unmask]> wrote:
>> Does anyone have any decent open source code to parse a citation? I'm
>> talking about a completely narrative citation like someone might
>> cut-and-paste from a bibliography or web page. I realize there are a
>> number of differnet formats this could be in (not to mention the human
>> error problems that always occur from human entered free text)--but
>> thinking about it, I suspect that with some work you could get something
>> that worked reasonably well (if not perfect). So I'm wondering if anyone
>> has donethis work.
>> (One of the commerical legal product--I forget if it's Lexis or
>> West--does this with legal citations--a more limited domain--quite
>> well. I'm not sure if any of the commerical bibliographic citation
>> management software does this?)
>> The goal, as you can probably guess, is a box that the user can paste a
>> citation into; make an OpenURL out of it; show the user where to get the
>> citation. I'm pretty confident something useful could be created here,
>> with enough time put into it. But saldy, it's probably more time than
>> anyone has individually. Unless someone's done it already?