LISTSERV 16.5 - CODE4LIB Archives

On Nov 16, 2017, at 11:20 AM, Chris Gray <[log in to unmask]> wrote:

>> I’m thinking about a hands-on workshop on natural language processing & text mining, below, and your feedback is desired.  —ELM
> 
> You might be interested in something I ran across recently.  Aditya Parameswaran (http://data-people.cs.illinois.edu/) gave a talk at our campus recently about the efforts of a group he participates in that is aimed at "simplifying and improving data analytics, i.e., helping users make better use of their data".  He wrote a recent blog post for O'Reilly on "Enabling Data Science for the Majority" (https://www.oreilly.com/ideas/enabling-data-science-for-the-majority), which was the topic of the talk I heard.
> 
> He introduced 3 of the 6 projects his team has been working on: DataSpread, Zenvisage, and OrpheusDB all aimed at what they call "HILDA" -- "human-in-the-loop data analytics".  The 3 projects listed have homes in github and are linked to from Aditya's page: "Quick Project Links".  At the talk, he said they have hosted versions running and they are looking for beta testers.  There is a live demo of DataSpread at http://kite.cs.illinois.edu:8080/.


Chris, thank you for brining this to my attention.

Parameswaran, above, outlines 5 problems with “big data”:

  1. The Excel problem: Over-reliance on spreadsheets
  2. The exploration problem: Not knowing where to look
  3. The data lake problem: Messy cesspools of data
  4. The data versioning problem: Ad-hoc management of analysis 
  5. The learning problem: Hurdles in leveraging machine learning

I can identify with many of these problem, as I suspect many of you can too. So many times I see my fellow librarians trying to make sense of a data set with only Excel. Heck, they even try to evaluate MARC in this way. Some data just does not fit into a single matrix. “Messy” data is also a perennial problem. Again, coming back to our bibliographic data, the city of a 260 field might be South Bend, IN; South Bend, Ind.; or South Bend. Moreover, parsing the data from the records often brings along punctuation. Mr Kilgour’s name was Kilgour, Fredrick (1914-2006).

Put another way, yes, I spend a lot of my time dealing with the issues outlined above, and I believe such is a possibility for modern librarianship. Now a days, find is not nearly as much of a problem to solve. Instead, I believe the more pressing problem to solve is enabling people (readers) to use & understand the data/information they find. 

—
Eric Morgan