LISTSERV 16.5 - CODE4LIB Archives

On Dec 13, 2017, at 8:58 AM, Rory Litwin <[log in to unmask]> wrote:

> http://libraryjuiceacademy.com/112-digital-humanities.php


Interesting! Fun!! Good luck, sincerely.

Based on some of my experience, one of the impediments for doing “digital humanities” is the process of coercing one’s data/information into a format a computer can manipulate. To address this problem, I have hacked away & sketched out a few Web-based tools:

  * Extract plain text (http://dh.crc.nd.edu/sandbox/nlp-clients/tika-client.cgi) - Given a PDF (or just about any other file type), return plain text. The result of this process is the basis for just about everything else. Open in text editor. Find/replace space with newline. Normalize case. Sort. Open in spreadsheet to count & tabulate. Open in concordance. Feed to Voyant. Etc.

  * POS client (http://dh.crc.nd.edu/sandbox/nlp-clients/pos-client.cgi) - Given a plain text file, return ngrams, parts-of-speech, and lemmas in a number of formats. Again, the results of this tool can be fed to spreadsheets, databases, or visualization tools such as Wordle, OpenRefine, Tableau or a graphing tools like Gephi.

  * NER tools (http://dh.crc.nd.edu/sandbox/nlp-clients/ner-client.cgi) - Working much like the POS client, and given a plain text file, return lists of named entities from a text. 

It is not possible to create a generic tool that will support the actual analysis of text, because the analysis of text is particular to each scholar. I am only able to provide the data/information. I believe it is up to scholar to do the evaluation.

Feel free to give the tool(s) a go, but your milage will vary.

—
Eric Morgan