LISTSERV 16.5 - CODE4LIB Archives

On Jun 27, 2022, at 5:27 PM, Juarez, Francisco D <[log in to unmask]> wrote:

> How do I track 300 articles into a mind map?
> 
> A request came in recently asking for recommendations on software that could help with this.
> The goal is a broad and detailed assembly of journal articles into one database and sets of mind maps. They've collective articles into citation managers (i.e., endnote) to help track an article list of works. But they're looking for a fuller database to track all items by subcategory, trace the writers and their other articles and run various queries.
> 
> Does anyone know/recommend a particular mind mapping software for the retrieval, tracking, and querying of journal articles? No preference if Open Source or proprietary. There are the best mind maps for 2022, but I have zero experience with mind maps and would like to hear the communities expertise on this one.


From the WYHAHEB2LLAN Department [1], My Reader is a possible solution. Here's how it works:

  0. Ask yourself a research question, and the question can range from the mundane to the sublime. The question might be, "How does Ralph Waldo Emerson define being human?"

  1. Create a directory on your computer, and put as many files as you desire and of type into the directory. Mind you, since you will be doing textual analysis, it does not make sense to put things like image files nor Excel spreadsheets into the directory. Scholarly journal articles in the form of PDF files, HTML files, Word documents, or plain text filse make a lot of sense. Let's say the directory is named "emerson-articles", and it can easily contain hundreds, if not thousands, of files.

  2. Install a Python-based thing called the Reader Toolbox. From the command line, enter "pip install reader-toolbox", and your milage may vary.

  3. Based on the content of the directory, use the Toolbox to create a dataset from the articles. The Toolbox command will look something like this, "rdr build emerson emerson-articles". 

  4. The Toolbox will go about caching the original content, converting it to plain text, doing all sorts of feature extraction against it, and save the results as a set of delimited files and a relational database. The result is a data set -- something amenable to computation. I call these data sets "study carrels".  :-D 

  5. Use any number of applications to analyze ("read") the study carrel. For example, the delimited files can be imported into ANY database or spreadsheet application. They are also amenable to OpenRefine. Use AntConc to do concordancing against the plain texts. Use the venerable word cloud program, Wordle, to visualize frequencies. Apply your SQL skills against the whole to query the dataset this way, that way, or the other way. Use MALLET to topic model the data and then visualize how ideas ebbed and flowed over time. Apply semantic indexing (word2vec) to the result to learn what words are near other words. Use word proximities -- a type of collocation -- to create a network graph, visualize the result, and learn what words are key to the corpus.

  6. The Toolbox does all of the things outlined in Step #5, but from the command line.

Two more things. First, since you have bibliographic metadata, then you ought to be able to associate that metadata with the original content, and thus you will be able to compare & contrast things like authors or dates. The key to this process is the creation of a CSV file mapping authors, titles, and dates to file names. 

Second, I did this work against the latest issue Information Technology And Libraries (ITAL), and some rudimentary analysis is temporarily available on one of my websites. [2]

For more detailed information regarding the Reader, see the online (and still in process) documentation. [3]

Fun with distant reading.

P.S. So, how does Emerson define being a man -- human? To answer the question, I used the HathiTrust Data API to download the complete works of Emerson, created a study carrel from the result, and then extracted all the sentences with the form <NOUNPHRASE><PREDICATE><NOUNPHRASE> and whose first <NOUNPHRASE> contained the words "human", "man", "men", "woman", "women" and the <PREDICATE> contained a form of the verb "to be". The result is informative. [4]


[1] WYHAHEB2LLAN - "When you have a hammer, everything begins to look like a nail"
[2] ITAL - http://dh.crc.nd.edu/tmp/ital-v41n02/
[3] documentation - https://reader-toolbox.readthedocs.io
[4] humans, defined - http://dh.crc.nd.edu/tmp/men-defined.txt

--
Eric Lease Morgan
Navari Family Center for Digital Scholarship
Hesburgh Libraries
University of Notre Dame

574/631-8604
https://cds.library.nd.edu