LISTSERV 16.5 - CODE4LIB Archives

thanks so much for your post Alex, i hadn't had a chance to
consider Wolfram|Alpha (WA) seriously until you posted the link
to the talk (and i had the time to actually watch it).

On 5/3/09 6:13 PM, Alexander Johannesen wrote:
 > http://www.youtube.com/watch?v=5TIOH80Qg7Q
 > Organisations and people are slowly turning into data
 > producers, not book producers.

when i think of data producers, i think CRC press and the like,
companies that compile and publish scientific data. certainly
much of this data is now born-digital or being converted to
digital formats (or put on the web), rather than only being
published in books. but these organizations and people are
still producing data, and those that produce books are in a
rapidly changing space (aren't we all).

imo, the advent of WA will likely result in the production of
_more_ books, not less, and will almost certainly benefit
libraries and learners.

after watching Mr. Wolfram's talk, i realize that most of the
responses to Wolfram Alpha on the net appear to be missing the
point. more specifically,

* WA consists of curated (computable) data + algorithms (5M+
   lines of Mathematica) + (inverted) linguistic analysis[1] +
   automated presentation.

* afaict WA does not attempt to compete with Google or Wikipedia
   or open source/public science, they are all complimentary and
   compatible!

* WA is admirably unique in its effort to make quality data
   useful, rather than merely organizing/regurgitating heaps of
   folk data and net garbage.

* the value added by WA is that it makes (so-called) public data
   "computable", in the NKS[2] sense, as executable Mathematica
   code.

as mentioned in the talk, Wolfram engineers take data from
proprietary, widely accepted, peer-reviewed sources (probably
familiar to any research librarian) and transforms it into
datasets computable in the WA environment[3].

there is considerable confusion as to how WA compares to Google,
Wikipedia, and the Open Source world. i think Google is solving
a different problem with very different data, and Wikipedia (as
mentioned in the talk) is one of many input sources to WA. more
specifically,

* Google's input data set is un-curated, albeit cleverly ranked,
   links to web pages, and _some_ data from the web. it (rightly)
   does not have "computable" data or the Mathematica
   computational engine, but does have many of the natural
   language and automatic presentation features, as well as a
   search engine query box type interface (which i think is the
   cause of much incorrect comparison).

* Wikipedia is merely folk input to WA, complimentary but
   missing _quality_ data (think CRC press), computational
   algorithms, natural language processing, and automated
   presentation. the only basis for comparison i can see here is
   that both Wikipedia and WA contain a lot of useful information
   - however, what is done with and how you interact with that
   data is clearly very different.

* WA is not in danger of being "open-sourced" because curating
   and converting quality scientific data into computable
   datasets is non-trivial, and so is the Mathematica
   computational engine. the comparisons here, i think stem from
   the fact that it has a web interface, and much of the data is
   available from public sources. for many problem-solvers, i
   think it's natural to respond with, "hmmm, how would i have
   done this..."

ultimately, i think Wolfram Alpha will be an extremely valuable
tool for libraries, and could (hopefully) change the way
learners think about how to get information and solve problems.

i think it's exciting to think that it could steer learners and
researchers away from looking to the web (unfortunately, almost
always Google by default) for quick answers, and back to
thinking about how they can answer questions for themselves,
given quality information, and powerful tools for problem
solving.


[log in to unmask]


Notes:

[1] as mentioned near 0:39:00 in the video, Wolfram explains
that the natural language problem that WA attempts to solve
(like search engines) is different than the traditional one.
the traditional NLP problem is taking a mass of data produced
by humans and trying to make sense of it, while the query box
problem is taking short human utterances, and trying to
formulate a problem which is computable from a mass of data.

[2] A New Kind of Science
http://www.wolframscience.com/nksonline/toc.html
i must confess, i haven't completely digested this material.

[3] as a long-time MATLAB user in a former life, this makes a
lot of sense. in MATLAB, everything is a computable matrix, and
solving problems in that environment is about taking (highly
non-linear) real-world problems, and linearizing them to be
computable in the MATLAB environment. this approach has deep
mathematical roots, and is consistent in solving problems across
many scientific disciplines, so the kind of problems which can
be solved with the help of MATLAB is broad and deep.

the Mathematica computational engine has a similar genetic
heritage, if you will; but also includes curated "computable
datasets", including Astronomical, Chemical, Geospatial,
Financial, Mathematical, Language, Biomedical, and Weather
data. formerly, this data was available within the Mathematica
environment. Wolfram Alpha makes this data available through an
alternative interface, much like a search engine query
interface.