There have been some great software recommendations in this thread, that I really don't want to quibble with. What I'd like to quibble with is the software-first approach. We've all tried the software-first approach, how many of us were happy with it? There is a standard in this area and that standard appears to have at least two non-trivial implementations, including from one software distributor whose name we all recognise. SPEC: http://docs.oasis-open.org/uima/v1.0/uima-v1.0.html APACHE UIMA: http://uima.apache.org/ GATE: http://gate.ac.uk/ Anyone have experience using the standard or these two implementations? cheers stuart -- Stuart Yeates Library Technology Services http://www.victoria.ac.nz/library/