On Aug 9, 2019, at 4:28 PM, Eric Hanson <[log in to unmask]> wrote:
> The newest issue of code4Lib Journal is now available - https://journal.code4lib.org/issues/issues/issue45
It is nice to see our journal thrive.
Our journal is also great fodder for "distant reading", and I took it upon myself to "read it from afar", and I was happy to see the topic modeling seemed to work well:
http://carrels.distantreader.org/library/code4lib-45/index.htm#topic-modeling
More specifically, I requested five topics of the issue, the following "themes" were returned with the most-associated article titles:
1. ar, data, sh - Programming Poetry: Using a Poem Printer and
Web Programming to Build Vandal Poem of the Day
2. org, rightsstatements, copyright - Consortial
RightsStatements.org Implementation and Faceted Search for Reuse
Rights in Digital Library Materials
3. terms, library, video - Generating Geographic Terms for
Streaming Videos Using Python: A Comparative Analysis
4. sinopia, react, component - Developing Sinopia’s Linked-Data
Editor with React and Redux
5. ohsu, search, publications - Building an institutional author
search tool
The extraction of statistically significant keywords worked well too, but some of them ought to be denoted as stop words:
http, libraries, library, coding, collecting, component,
copyright, data, digitization, digitizing, falsc, functioning,
like, metadata, method, ohsu, people, poems, poetry, policy,
prints, publication, rdf, react, rightsstatement, searching,
shacl, sinopia, syndrome, term, video, web, williamson, wordpress
Fun with indexing.
--
Eric Morgan
University of Notre Dame
|