This is the third open source citation parser I know of now. A welcome change from a year ago when I needed one and didn't know of any! But I can't help but think maybe people should be cooperating more instead of engineering their own wheels. Also curious if anyone has looked at all three and can compare and contrast and make a reccommendation.
The other two I know about are:
ParsCit -- http://wing.comp.nus.edu.sg/parsCit/
A CDL project I don't have a good home page for, but code is here: http://gales.cdlib.org/~egh/hmm-citation-extractor/
I've been keeping track because I have a use for this, although haven't had time to make use of any of them yet.
Anyone want to compare and contrast these three projects? Might make a good very short article/review for the Code4Lib Journal if you wanted to.
Jonathan
>>> jean rainwater <[log in to unmask]> 09/12/08 2:25 PM >>>
Please help us beta test "FreeCite", a new citation parser for
non-structured bibliographic data. FreeCite is the result of
collaboration between the Brown University Library and Public Display,
a Providence-based software company founded by and employing many
Brown grads. Public Display's core business is information
extraction. Partial funding for this project was provided by the
Andrew W. Mellon Foundation.
FreeCite is implemented in Ruby on Rails and uses the CRF++ library
implementation of conditional random fields. The model is trained on
the CORA dataset with lexical augmentation from the Directory of
Research and Researchers at Brown (DRR-B). The API and code are
available at: http://freecite.library.brown.edu.
Jean Rainwater
Co-Leader, Integrated Technology Services
Brown University Library
Providence, RI 02912
401.863.9031
[log in to unmask]
|