On Nov 16, 2017, at 11:20 AM, Chris Gray <[log in to unmask]> wrote:
>> I’m thinking about a hands-on workshop on natural language processing & text mining, below, and your feedback is desired. —ELM
> You might be interested in something I ran across recently. Aditya Parameswaran (http://data-people.cs.illinois.edu/) gave a talk at our campus recently about the efforts of a group he participates in that is aimed at "simplifying and improving data analytics, i.e., helping users make better use of their data". He wrote a recent blog post for O'Reilly on "Enabling Data Science for the Majority" (https://www.oreilly.com/ideas/enabling-data-science-for-the-majority), which was the topic of the talk I heard.
> He introduced 3 of the 6 projects his team has been working on: DataSpread, Zenvisage, and OrpheusDB all aimed at what they call "HILDA" -- "human-in-the-loop data analytics". The 3 projects listed have homes in github and are linked to from Aditya's page: "Quick Project Links". At the talk, he said they have hosted versions running and they are looking for beta testers. There is a live demo of DataSpread at http://kite.cs.illinois.edu:8080/.
Chris, thank you for brining this to my attention.
Parameswaran, above, outlines 5 problems with “big data”:
1. The Excel problem: Over-reliance on spreadsheets
2. The exploration problem: Not knowing where to look
3. The data lake problem: Messy cesspools of data
4. The data versioning problem: Ad-hoc management of analysis
5. The learning problem: Hurdles in leveraging machine learning
I can identify with many of these problem, as I suspect many of you can too. So many times I see my fellow librarians trying to make sense of a data set with only Excel. Heck, they even try to evaluate MARC in this way. Some data just does not fit into a single matrix. “Messy” data is also a perennial problem. Again, coming back to our bibliographic data, the city of a 260 field might be South Bend, IN; South Bend, Ind.; or South Bend. Moreover, parsing the data from the records often brings along punctuation. Mr Kilgour’s name was Kilgour, Fredrick (1914-2006).
Put another way, yes, I spend a lot of my time dealing with the issues outlined above, and I believe such is a possibility for modern librarianship. Now a days, find is not nearly as much of a problem to solve. Instead, I believe the more pressing problem to solve is enabling people (readers) to use & understand the data/information they find.