Print

Print


On Mar 30, 2006, at 7:12 AM, Eric Lease Morgan wrote:
> How would you go about doing this sort of analysis? All I have to
> start with is my Apache "combined" access_log files?

I'm not sure about the 'morality' issue, but It might be interesting
to see whether the links are distributed according to a power law
[1]. This could go both ways: looking at the hosts that are linking
to you, and the target urls on your site. My guess is that they will
be given the research that has gone into showing this happens at the
host name level [2] on the web at large.

Using Google's API you could lookup pages that are linking to your
stuff, and compare contrast to what you are seeing in your logs. It
might be interesting to extract the google search query from the log
and plug it back into google and see what page number your url comes
up in. This could serve as a metric of how often people go past the
first page of  search results in google. Perhaps there some other
interesting stuff you can do with the google api.

Referrer logs are a really interesting artifact of the operating web.
In exploring 'backlinks', you are in the good company of Bush [3],
Garfield [4], Nelson [5] and that late-comer Page [6]. Referrer logs
won't tell you everyone who is linking to you, but only some of the
links that have been travelled. In some ways this usage data is even
more valuable than the complete index of backlinks that google has,
since it records intention. Perhaps when google acquires enough dark
fiber they'll be able to capture this as well--but for the moment
they don't know what people are clicking on once they leave google.com.

//Ed

[1] http://www.kottke.org/03/02/weblogs-and-power-laws
[2] http://www.nd.edu/~networks/Linked/index.html
[3] http://www.theatlantic.com/doc/194507/bush
[4] http://www.garfield.library.upenn.edu/
[5] http://www.readwriteweb.com/archives/ted_nelsons_two.php
[6] http://en.wikipedia.org/wiki/PageRank