Thanks for pointing out these other parsing tools. I've added them to
the list on our website (see under heading "Other Citation Tools" at
http://freecite.library.brown.edu/).
Citation metadata extraction is a difficult open problem whose
potential solutions are based on continually-developing technologies.
So I think it's important that we approach this task from many diverse
angles. If our project makes a little headway here, ParsCit makes some
headway there, and five other groups make their own advancements,
hopefully we'll be able to pool our findings into a viable
application.
> Anyone want to compare and contrast these three projects? Might make a good very
> short article/review for the Code4Lib Journal if you wanted to.
Agreed. I'd love to see this. Another idea might be to write an
application that takes the output of multiple parsers and assembles
the best answer.
On Fri, Sep 12, 2008 at 3:50 PM, Jonathan Rochkind <[log in to unmask]> wrote:
> This is the third open source citation parser I know of now. A welcome change from a year ago when I needed one and didn't know of any! But I can't help but think maybe people should be cooperating more instead of engineering their own wheels. Also curious if anyone has looked at all three and can compare and contrast and make a reccommendation.
>
> The other two I know about are:
>
> ParsCit -- http://wing.comp.nus.edu.sg/parsCit/
> A CDL project I don't have a good home page for, but code is here: http://gales.cdlib.org/~egh/hmm-citation-extractor/
>
> I've been keeping track because I have a use for this, although haven't had time to make use of any of them yet.
>
> Anyone want to compare and contrast these three projects? Might make a good very short article/review for the Code4Lib Journal if you wanted to.
>
> Jonathan
>
>
>>>> jean rainwater <[log in to unmask]> 09/12/08 2:25 PM >>>
> Please help us beta test "FreeCite", a new citation parser for
> non-structured bibliographic data. FreeCite is the result of
> collaboration between the Brown University Library and Public Display,
> a Providence-based software company founded by and employing many
> Brown grads. Public Display's core business is information
> extraction. Partial funding for this project was provided by the
> Andrew W. Mellon Foundation.
>
> FreeCite is implemented in Ruby on Rails and uses the CRF++ library
> implementation of conditional random fields. The model is trained on
> the CORA dataset with lexical augmentation from the Directory of
> Research and Researchers at Brown (DRR-B). The API and code are
> available at: http://freecite.library.brown.edu.
>
> Jean Rainwater
> Co-Leader, Integrated Technology Services
> Brown University Library
> Providence, RI 02912
> 401.863.9031
> [log in to unmask]
>
|