If someone else were getting started and didn't want to assemble their own training data -- do you think it would be likely useful for them to aggregate your training data _and_ Brown's training data together and generate a new model? Was there a particular reason you chose not to use Brown's training data and add on it to it, but start over from scratch? Forgive me if this is a stupid question, I'm still trying to learn about this stuff. And start to figure out how I'm going to deal with it when I get around to using FreeCite, which I surely will. Would it maybe make sense to actually seperate the training data and trained model in a seperate library, so people could even pick and choose what already built trained model they want to use, or build their own, without dealing with repo conflicts? > The training data is not currently under source control (it's in the > database), but the trained model is. > > That's, admittedly, a bit of a downside to my fork (although the model > being checked into git is true of the original, as well) since you'd > always be in conflict with my trained model if you train your own. > > -Ross. > > On Monday, October 17, 2011, Jonathan Rochkind <[log in to unmask] > <mailto:[log in to unmask]>> wrote: > > When you say you've added to the training data, have you shared your > additions back with Brown, or your new improved training data is only > in your fork? Or is only held locally by you and isn't even in your > github fork? Please clarify, thanks! > > > > On 10/13/2011 8:52 PM, Ross Singer wrote: > >> > >> Yeah, we've been doing a lot with (and putting a lot of updates into) > >> FreeCite. We only use the webservice (although we don't use the > >> OpenURL context object and instead added a JSON response). It works > >> pretty well (not always great, but certainly better than nothing) - > >> especially for giving us something "good enough" to throw against some > >> OpenLibrary and Crossref data to look for matches. Basically what > >> we're using it for is to go from a citation string to an RDF graph. > >> > >> BTW, there have been no problems with post-2000 dates (not to say that > >> there aren't plenty of other problems) - this might have been either a > >> training issue or something a later version of CRF++ worked out. We > >> also add the citations it couldn't parse correctly to its training > >> data, which might help this. > >> > >> Anyway, yeah, if anybody is interested, feel free to try it out. One > >> thing my fork does is remove the PostgreSQL dependency, if that's an > >> issue for anybody. It's kind of handy to be able to just use SQLite > >> or MySQL or whatever to try it out. > >> > >> -Ross. > >> > >> On Thu, Oct 13, 2011 at 7:42 PM, Avram Lyon<[log in to unmask] > <mailto:[log in to unmask]>> wrote: > >>> > >>> On Thu, Oct 13, 2011 at 2:33 PM, Will Kurt<[log in to unmask] > <mailto:[log in to unmask]>> wrote: > >>>> > >>>> I always think that Brown's FreeCite api is under utilized. > >>>> http://freecite.library.brown.edu/ > >>>> It's far from perfect, but I'm sure more use could be made of it. > >>>> > >>>> A few months back I threw together a copy/paste citation look-up > with it: > >>>> CiteBox > >>>> http://willkurt.github.com/CiteBox/ > >>>> > >>>> Of course I don't think anyone is really making use of it, but I've > >>>> also done nothing to really promote it either ;) > >>> > >>> The FreeCite parser had major issues for a while with post-2000 dates, > >>> and I believe the installation at Brown still does, but, to judge by > >>> the GitHub activity (most active fork here: > >>> https://github.com/rsinger/free_cite/), some enterprising folks have > >>> picked it up after a period of apparent dormancy. This is great to > >>> see, and vital to any project that hopes to use its API for anything > >>> serious. > >>> > >>> By the way, the rarely-used XML representation of OpenURL > >>> ContextObjects that FreeCite produces is supported by Zotero as a > >>> full-fledged input format, a fact that might come in handy if you're > >>> hoping to have your API produce something that Zotero users can > >>> import. > >>> > >>> Avram > >>> > >>> UCLA Slavic, Zotero community dev > >>> > >