On Mar 30, 2006, at 7:12 AM, Eric Lease Morgan wrote: > How would you go about doing this sort of analysis? All I have to > start with is my Apache "combined" access_log files? I'm not sure about the 'morality' issue, but It might be interesting to see whether the links are distributed according to a power law [1]. This could go both ways: looking at the hosts that are linking to you, and the target urls on your site. My guess is that they will be given the research that has gone into showing this happens at the host name level [2] on the web at large. Using Google's API you could lookup pages that are linking to your stuff, and compare contrast to what you are seeing in your logs. It might be interesting to extract the google search query from the log and plug it back into google and see what page number your url comes up in. This could serve as a metric of how often people go past the first page of search results in google. Perhaps there some other interesting stuff you can do with the google api. Referrer logs are a really interesting artifact of the operating web. In exploring 'backlinks', you are in the good company of Bush [3], Garfield [4], Nelson [5] and that late-comer Page [6]. Referrer logs won't tell you everyone who is linking to you, but only some of the links that have been travelled. In some ways this usage data is even more valuable than the complete index of backlinks that google has, since it records intention. Perhaps when google acquires enough dark fiber they'll be able to capture this as well--but for the moment they don't know what people are clicking on once they leave google.com. //Ed [1] http://www.kottke.org/03/02/weblogs-and-power-laws [2] http://www.nd.edu/~networks/Linked/index.html [3] http://www.theatlantic.com/doc/194507/bush [4] http://www.garfield.library.upenn.edu/ [5] http://www.readwriteweb.com/archives/ted_nelsons_two.php [6] http://en.wikipedia.org/wiki/PageRank