The training data is not currently under source control (it's in the
database), but the trained model is.
That's, admittedly, a bit of a downside to my fork (although the model being
checked into git is true of the original, as well) since you'd always be in
conflict with my trained model if you train your own.
-Ross.
On Monday, October 17, 2011, Jonathan Rochkind <[log in to unmask]> wrote:
> When you say you've added to the training data, have you shared your
additions back with Brown, or your new improved training data is only in
your fork? Or is only held locally by you and isn't even in your github
fork? Please clarify, thanks!
>
> On 10/13/2011 8:52 PM, Ross Singer wrote:
>>
>> Yeah, we've been doing a lot with (and putting a lot of updates into)
>> FreeCite. We only use the webservice (although we don't use the
>> OpenURL context object and instead added a JSON response). It works
>> pretty well (not always great, but certainly better than nothing) -
>> especially for giving us something "good enough" to throw against some
>> OpenLibrary and Crossref data to look for matches. Basically what
>> we're using it for is to go from a citation string to an RDF graph.
>>
>> BTW, there have been no problems with post-2000 dates (not to say that
>> there aren't plenty of other problems) - this might have been either a
>> training issue or something a later version of CRF++ worked out. We
>> also add the citations it couldn't parse correctly to its training
>> data, which might help this.
>>
>> Anyway, yeah, if anybody is interested, feel free to try it out. One
>> thing my fork does is remove the PostgreSQL dependency, if that's an
>> issue for anybody. It's kind of handy to be able to just use SQLite
>> or MySQL or whatever to try it out.
>>
>> -Ross.
>>
>> On Thu, Oct 13, 2011 at 7:42 PM, Avram Lyon<[log in to unmask]> wrote:
>>>
>>> On Thu, Oct 13, 2011 at 2:33 PM, Will Kurt<[log in to unmask]> wrote:
>>>>
>>>> I always think that Brown's FreeCite api is under utilized.
>>>> http://freecite.library.brown.edu/
>>>> It's far from perfect, but I'm sure more use could be made of it.
>>>>
>>>> A few months back I threw together a copy/paste citation look-up with
it:
>>>> CiteBox
>>>> http://willkurt.github.com/CiteBox/
>>>>
>>>> Of course I don't think anyone is really making use of it, but I've
>>>> also done nothing to really promote it either ;)
>>>
>>> The FreeCite parser had major issues for a while with post-2000 dates,
>>> and I believe the installation at Brown still does, but, to judge by
>>> the GitHub activity (most active fork here:
>>> https://github.com/rsinger/free_cite/), some enterprising folks have
>>> picked it up after a period of apparent dormancy. This is great to
>>> see, and vital to any project that hopes to use its API for anything
>>> serious.
>>>
>>> By the way, the rarely-used XML representation of OpenURL
>>> ContextObjects that FreeCite produces is supported by Zotero as a
>>> full-fledged input format, a fact that might come in handy if you're
>>> hoping to have your API produce something that Zotero users can
>>> import.
>>>
>>> Avram
>>>
>>> UCLA Slavic, Zotero community dev
>>>
>
|