Print

Print


If someone else were getting started and didn't want to assemble their 
own training data -- do you think it would be likely useful for them to 
aggregate your training data _and_ Brown's training data together and 
generate a new model?  Was there a particular reason you chose not to 
use Brown's training data and add on it to it, but start over from scratch?

Forgive me if this is a stupid question, I'm still trying to learn about 
this stuff.

And start to figure out how I'm going to deal with it when I get around 
to using FreeCite, which I surely will. Would it maybe make sense to 
actually seperate the training data and trained model in a seperate 
library, so people could even pick and choose what already built trained 
model they want to use, or build their own, without dealing with repo 
conflicts?

> The training data is not currently under source control (it's in the 
> database), but the trained model is.
>
> That's, admittedly, a bit of a downside to my fork (although the model 
> being checked into git is true of the original, as well) since you'd 
> always be in conflict with my trained model if you train your own.
>
> -Ross.
>
> On Monday, October 17, 2011, Jonathan Rochkind <[log in to unmask] 
> <mailto:[log in to unmask]>> wrote:
> > When you say you've added to the training data, have you shared your 
> additions back with Brown, or your new improved training data is only 
> in your fork? Or is only held locally by you and isn't even in your 
> github fork?  Please clarify, thanks!
> >
> > On 10/13/2011 8:52 PM, Ross Singer wrote:
> >>
> >> Yeah, we've been doing a lot with (and putting a lot of updates into)
> >> FreeCite.  We only use the webservice (although we don't use the
> >> OpenURL context object and instead added a JSON response).  It works
> >> pretty well (not always great, but certainly better than nothing) -
> >> especially for giving us something "good enough" to throw against some
> >> OpenLibrary and Crossref data to look for matches.  Basically what
> >> we're using it for is to go from a citation string to an RDF graph.
> >>
> >> BTW, there have been no problems with post-2000 dates (not to say that
> >> there aren't plenty of other problems) - this might have been either a
> >> training issue or something a later version of CRF++ worked out.  We
> >> also add the citations it couldn't parse correctly to its training
> >> data, which might help this.
> >>
> >> Anyway, yeah, if anybody is interested, feel free to try it out.  One
> >> thing my fork does is remove the PostgreSQL dependency, if that's an
> >> issue for anybody.  It's kind of handy to be able to just use SQLite
> >> or MySQL or whatever to try it out.
> >>
> >> -Ross.
> >>
> >> On Thu, Oct 13, 2011 at 7:42 PM, Avram Lyon<[log in to unmask] 
> <mailto:[log in to unmask]>>  wrote:
> >>>
> >>> On Thu, Oct 13, 2011 at 2:33 PM, Will Kurt<[log in to unmask] 
> <mailto:[log in to unmask]>>  wrote:
> >>>>
> >>>> I always think that Brown's FreeCite api is under utilized.
> >>>> http://freecite.library.brown.edu/
> >>>> It's far from perfect, but I'm sure more use could be made of it.
> >>>>
> >>>> A few months back I threw together a copy/paste citation look-up 
> with it:
> >>>> CiteBox
> >>>> http://willkurt.github.com/CiteBox/
> >>>>
> >>>> Of course I don't think anyone is really making use of it, but I've
> >>>> also done nothing to really promote it either ;)
> >>>
> >>> The FreeCite parser had major issues for a while with post-2000 dates,
> >>> and I believe the installation at Brown still does, but, to judge by
> >>> the GitHub activity (most active fork here:
> >>> https://github.com/rsinger/free_cite/), some enterprising folks have
> >>> picked it up after a period of apparent dormancy. This is great to
> >>> see, and vital to any project that hopes to use its API for anything
> >>> serious.
> >>>
> >>> By the way, the rarely-used XML representation of OpenURL
> >>> ContextObjects that FreeCite produces is supported by Zotero as a
> >>> full-fledged input format, a fact that might come in handy if you're
> >>> hoping to have your API produce something that Zotero users can
> >>> import.
> >>>
> >>> Avram
> >>>
> >>> UCLA Slavic, Zotero community dev
> >>>
> >