It's on our list of Big Problems To Solve; I'm hoping to have time to tackle it later this year :) -n On Jul 18, 2007, at 12:57 PM, Jonathan Rochkind wrote: > Ha! If it's not too difficult, then with all the time you've spent > "looking at it extensively", how come you don't have a solution yet? > > Just kidding. :) > > Jonathan > > Nathan Vack wrote: >> We've looked at this pretty extensively, and we're pretty certain >> there's nothing downloadable that does a "good enough" job. However, >> it's by no means impossible -- it seems to be undergrad thesis-level >> work in Singapore: >> >> http://wing.comp.nus.edu.sg/parsCit/ >> >> There used to be a paper describing this approach (essentially >> treating citation parsing as a natural language processing task and >> using a maximum entropy algorithm) online... the page even cites >> it... but it seems to be gone now. >> >> FWIW, it didn't look too difficult. >> >> -Nate >> >> On Jul 17, 2007, at 6:16 PM, Jonathan Rochkind wrote: >> >>> Does anyone have any decent open source code to parse a citation? >>> I'm >>> talking about a completely narrative citation like someone might >>> cut-and-paste from a bibliography or web page. I realize there are a >>> number of differnet formats this could be in (not to mention the >>> human >>> error problems that always occur from human entered free text)--but >>> thinking about it, I suspect that with some work you could get >>> something >>> that worked reasonably well (if not perfect). So I'm wondering if >>> anyone >>> has donethis work. >>> >>> (One of the commerical legal product--I forget if it's Lexis or >>> West--does this with legal citations--a more limited domain--quite >>> well. I'm not sure if any of the commerical bibliographic citation >>> management software does this?) >>> >>> The goal, as you can probably guess, is a box that the user can >>> paste a >>> citation into; make an OpenURL out of it; show the user where to >>> get the >>> citation. I'm pretty confident something useful could be created >>> here, >>> with enough time put into it. But saldy, it's probably more time >>> than >>> anyone has individually. Unless someone's done it already? >>> >>> Hopefully, >>> Jonathan >>> >> > > -- > Jonathan Rochkind > Sr. Programmer/Analyst > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 > rochkind (at) jhu.edu >