We've looked at this pretty extensively, and we're pretty certain
there's nothing downloadable that does a "good enough" job. However,
it's by no means impossible -- it seems to be undergrad thesis-level
work in Singapore:
http://wing.comp.nus.edu.sg/parsCit/
There used to be a paper describing this approach (essentially
treating citation parsing as a natural language processing task and
using a maximum entropy algorithm) online... the page even cites
it... but it seems to be gone now.
FWIW, it didn't look too difficult.
-Nate
On Jul 17, 2007, at 6:16 PM, Jonathan Rochkind wrote:
> Does anyone have any decent open source code to parse a citation? I'm
> talking about a completely narrative citation like someone might
> cut-and-paste from a bibliography or web page. I realize there are a
> number of differnet formats this could be in (not to mention the human
> error problems that always occur from human entered free text)--but
> thinking about it, I suspect that with some work you could get
> something
> that worked reasonably well (if not perfect). So I'm wondering if
> anyone
> has donethis work.
>
> (One of the commerical legal product--I forget if it's Lexis or
> West--does this with legal citations--a more limited domain--quite
> well. I'm not sure if any of the commerical bibliographic citation
> management software does this?)
>
> The goal, as you can probably guess, is a box that the user can
> paste a
> citation into; make an OpenURL out of it; show the user where to
> get the
> citation. I'm pretty confident something useful could be created
> here,
> with enough time put into it. But saldy, it's probably more time than
> anyone has individually. Unless someone's done it already?
>
> Hopefully,
> Jonathan
>
|