At Sat, 12 Jul 2008 10:46:06 -0400,
Godmar Back <[log in to unmask]> wrote:
>
> Min, Eric, and others working in this domain -
>
> have you considered designing your software as a scalable web service
> from the get-go, using such frameworks as Google App Engine? You may
> be able to use Montepython for the CRF computations
> (http://montepython.sourceforge.net/)
>
> I know Min offers a WSDL wrapper around their software, but that's
> simply a gateway to one single-machine installation, and it's not
> intended as a production service at that.
Thanks for the link to montepython. It looks like it might be a good
tool for me to learn more about machine learning.
As for my citation metadata extractor, once the training data is
generated it would be trivial to scale it; there is no shared state.
All that is really needed is an implementation of the Viterbi
algorithm, & there is one (in pure Python) on the wikipedia page; it
is about 20 lines of code. So presumably it could be scaled on the
Google app engine pretty easily. But it could be scaled on anything
pretty easily; all you need is a load balancer and however many
servers are necessary (not many, I would think).
best,
Erik Hetzner
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3
|