Some thoughts. BTW, new to the list - librarian working for a study-abroad
program in Beijing here, building a new catalog with Koha these days and
previously did competitive intelligence for investors looking at China's IT
industries. I appreciate Matt trying to start an open-ended conversation
about innovation and thought I'd toss my own rant in the ring.

One of the things that really struck me about libraries when studying for
my MLIS was how much library systems were designed primarily for the
backend and not consumer-facing until post-Internet, and built and
maintained by third parties that aren't practicing or even trained
librarians (and charging a pretty penny for it). There's a lot of catch up
going on by a profession that outsourced these skill sets and is now
rebuilding through groups like CODE4LIB, hence we may be behind the curve
on innovation for a long time.

I'm not sure how much "Big Data" really comes into play for most libraries.
You might need terabytes of cloud storage for a digital preservation
project, but considering the bulk of that would be the digitized
images/videos/recordings themselves, each with a metadata record, you don't
necessarily have a very large or complex a data structure. How many library
projects are "beyond the ability of commonly used software tools to
capture, curate, manage, and process the data within a tolerable elapsed
time"? I'm honestly not sure, and I wonder about the nebulous definition.
What is "commonly used"? Hadoop? On the other hand preserving "Big Data",
say from the Large Hadron Collider, and creating discovery tools for future
researchers, is something that librarians could potentially be involved in,
but if CERN already built the database and discovery tools before it
reached the library, did we miss the game? Do Big Data projects say to
themselves in the planning stage "We need a librarian?" Should they? If so
are we ready?

Then there's the privacy issue: Even before Snowden, the ALA Code of Ethics
bumped up against the power of crunching user data for recommendation
systems and the like. Even if you adequately anonymize your data, taking it
only in aggregate, it goes against the grain of traditional library
culture. Any discussion of retaining user social profiles, search history,
or activity tracking means talking about patron rights to anonymity.

The goal I've been fixated on for library software development has been to
deliver staff and patron-friendly open-source cataloging, discovery, and
curation tools for libraries that take back control of our systems from
closed corporate vendors, provide a user experience that matches or exceeds
expectations created in the marketplace, and remain committed to the
ethical standards and social contract traditionally held by libraries in
our society. When you consider that most of the professional news industry
delivers information discovery services using Drupal, Django, or Wordpress,
why can't there be robust ecosystems like these for libraries?

Hope I didn't bore anyone.

Dave Lyons
Digital Librarian
The Beijing Center for Chinese Studies