It's on our list of Big Problems To Solve; I'm hoping to have time to
tackle it later this year :)
-n
On Jul 18, 2007, at 12:57 PM, Jonathan Rochkind wrote:
> Ha! If it's not too difficult, then with all the time you've spent
> "looking at it extensively", how come you don't have a solution yet?
>
> Just kidding. :)
>
> Jonathan
>
> Nathan Vack wrote:
>> We've looked at this pretty extensively, and we're pretty certain
>> there's nothing downloadable that does a "good enough" job. However,
>> it's by no means impossible -- it seems to be undergrad thesis-level
>> work in Singapore:
>>
>> http://wing.comp.nus.edu.sg/parsCit/
>>
>> There used to be a paper describing this approach (essentially
>> treating citation parsing as a natural language processing task and
>> using a maximum entropy algorithm) online... the page even cites
>> it... but it seems to be gone now.
>>
>> FWIW, it didn't look too difficult.
>>
>> -Nate
>>
>> On Jul 17, 2007, at 6:16 PM, Jonathan Rochkind wrote:
>>
>>> Does anyone have any decent open source code to parse a citation?
>>> I'm
>>> talking about a completely narrative citation like someone might
>>> cut-and-paste from a bibliography or web page. I realize there are a
>>> number of differnet formats this could be in (not to mention the
>>> human
>>> error problems that always occur from human entered free text)--but
>>> thinking about it, I suspect that with some work you could get
>>> something
>>> that worked reasonably well (if not perfect). So I'm wondering if
>>> anyone
>>> has donethis work.
>>>
>>> (One of the commerical legal product--I forget if it's Lexis or
>>> West--does this with legal citations--a more limited domain--quite
>>> well. I'm not sure if any of the commerical bibliographic citation
>>> management software does this?)
>>>
>>> The goal, as you can probably guess, is a box that the user can
>>> paste a
>>> citation into; make an OpenURL out of it; show the user where to
>>> get the
>>> citation. I'm pretty confident something useful could be created
>>> here,
>>> with enough time put into it. But saldy, it's probably more time
>>> than
>>> anyone has individually. Unless someone's done it already?
>>>
>>> Hopefully,
>>> Jonathan
>>>
>>
>
> --
> Jonathan Rochkind
> Sr. Programmer/Analyst
> The Sheridan Libraries
> Johns Hopkins University
> 410.516.8886
> rochkind (at) jhu.edu
>
|