What are some techniques y'all would suggest for maintaining state for text processing?
I have created a number of terminal- and Web-based interfaces facilitating text mining services against… texts. All of these interfaces take a piece of input denoting what text to process, for example, http://concordance.library.nd.edu/app/?id=3330305
This is all well and good for texts I provide, but many people want to analyze their own texts. I could create an interface allowing readers (I don't use the word "users" anymore) to supply their own texts. These texts could be referenced as one or more URLs. They could be pasted texts that I save locally. They could be PDF (or Word) files I convert to plain text.
Given one or more URLs, I would be able to implement a user-agent to get them, cache them locally, and allow them to be used by the text mining interfaces for a limited period of time.
My question is, how do I maintain the state of these locally cached files? How do I reference them in my user interface and not keep the files around forever? How to I reference them so my interface does not need to get them from the Web each and every time analysis is done? Should I use a cookie to denote the local location of the reader's cached items? Should I create a random key that points to a list of locally cached text files and then include the key as a value in my HTTP GET request? I'm sure there are a number of choices. What do you suggest?