Although tagging is the hot new term for it, I remember reading about
similar ideas way, way back in my undergraduate days.  (Ok, that was only
around four years ago, but still).  I'd have to dig to get some of the
research, but my overall impression is that there are a couple of
qualities that could make tagging very useful:

1) Community tagging of all records

2) experts applying a hierarchal controlled vocabulary on top of the
tagging being assisted by using various statistical analysis.  This
tends to be done somewhat poorly by community tagging.

3) Multi-word tags (with normalization applied in both directions)

4) Thesauri to map between tagging, similar concepts, and the controlled
vocabulary.  This is really an extention of point 2.  I remember reading a
study that indicated people don't like to use thesauri when they have to
select it as an option  but they like the results when it's done for them.
(Think Google's suggestion to search term X instead of Y).

5) Use some statistical crunching of user-submitted reviews and
information to suggest possible tag words.  This can be tricky.  In Spring
02 right after Google opened up their api I worked on a project with some
other people that used somewhat circular logic that tried to find a
combination of words from a webpage that would bring that page up in
Google in the rankings.  The idea was if the combination of words was
first, it would be a decent description to find similar pages.  The
problem is if you did it too perfectly, it's likely to be nonsenical.
(An old IR problem, forget the name.).

For example, for many large hobby sites it was fine since it offered
things like "model train", but for small sites with little content we got
things like "steel factory singleton".  In that case it was the home page
of a professor who offhandly mentioned a trip to a steel factory.  The
professor rarely used words that would be the most useful, such as
computer science.  But it's useful to offer tips to human beings that can
quickly throw out garabage like "steel factory" as a suggestion for
the page.

Of course, if you're asking for the nitty-gritty implementation details of
what I would do....I'd have to think a little longer ;).  But not too
much.  At it's heart it would just be an index.  The statistical analysis
could be harvested by playing around with the large body of IR research
already out there.  Ah, and lots of promoting to get the critical mass of

I'm typed the response rather quickly, so sorry if it doesn't make a
whole lot of sense.  This stuff is part of the reason I got into library
science so I get carried away on occasion.

On Wed, 8 Mar 2006, Eric Lease Morgan wrote:

