Thanks Min, this is a great project, that I keep trying to find time to investigate more. Don't apologize for keeping us updated, please continue to!
Do you know if any of the improvements have improved detection of volume/issue/page# information? For what I want to use it for, reasonably accurate parsing of volume/issue/page# is needed, and so far whenever I've looked at demos, this seems to be something that all of these machine-learning-type approaches do pretty awfully at. (I wonder if you are not including this in your training much, because it isn't neccesary for your purposes to have volume/issue/page#?)
I also have wondered if it would make sense to take a machine-learning-type approach to begin with, but then supplement it with formal-rule-based parsing to attempt to get vol/issue/page# according to common patterns?
I don't have too much time to try work on this myself, but if anyone who is working on these various citation parsing efforts could improve volume/issue/page# to a reasonable level, it would make the libraries useful for a much greater range of applications.
Jonathan
>>> Min-Yen Kan <[log in to unmask]> 11/13/08 8:30 PM >>>
Dear all:
(Sorry to resurrect an old thread...)
We've seen the release of several new freely available reference
string parsers in recent months.
The ParsCit team has also been updating the ParsCit package, and is
happy to announce a new version that improves on classification
accuracy, and adds training data in Italian, German and French and for
a different discipline of humanities. We've updated the classification
model to reflect these changes, which should be as easy to use as the
original ParsCit.
You can either download a copy of ParsCit for your own use, or use it
through a web services interface. We welcome your feedback and hope
that if you use ParsCit or any other freely available reference string
parsing tool that you can contribute annotated data to help make these
models more robust.
ParsCit is available from: http://wing.comp.nus.edu.sg/parsCit/
Current Distribution: http://wing.comp.nus.edu.sg/parsCit/parscit-080917.zip
and is a joint collaboration between Pennsylvania State University
(the folks who brought you CiteSeerX) as well as the National
University of Singapore.
Cheers,
Min
P.S. Integration with other freely available parsing systems is
hopefully in the works too. If you have something to contribute, we'll
be happy to commit some bandwidth into getting it integrated with
ParsCit.
|