We've looked at this pretty extensively, and we're pretty certain
there's nothing downloadable that does a "good enough" job. However,
it's by no means impossible -- it seems to be undergrad thesis-level
work in Singapore:

There used to be a paper describing this approach (essentially
treating citation parsing as a natural language processing task and
using a maximum entropy algorithm) online... the page even cites
it... but it seems to be gone now.

FWIW, it didn't look too difficult.


On Jul 17, 2007, at 6:16 PM, Jonathan Rochkind wrote:

> Does anyone have any decent open source code to parse a citation? I'm
> talking about a completely narrative citation like someone might
> cut-and-paste from a bibliography or web page. I realize there are a
> number of differnet formats this could be in (not to mention the human
> error problems that always occur from human entered free text)--but
> thinking about it, I suspect that with some work you could get
> something
> that worked reasonably well (if not perfect). So I'm wondering if
> anyone
> has donethis work.
> (One of the commerical legal product--I forget if it's Lexis or
> West--does this with legal citations--a more limited domain--quite
> well.  I'm not sure if any of the commerical bibliographic citation
> management software does this?)
> The goal, as you can probably guess, is a box that the user can
> paste a
> citation into; make an OpenURL out of it; show the user where to
> get the
> citation.  I'm pretty confident something useful could be created
> here,
> with enough time put into it. But saldy, it's probably more time than
> anyone has individually. Unless someone's done it already?
> Hopefully,
> Jonathan