Matt Amory wrote: > Is anyone involved with, or does anyone know of any project to extract and > aggregate bibliography data from individual works to produce some kind of > "most-cited" authors list across a collection? Local/Network/Digital/OCLC > or historic? > > Sorry to be vague, but I'm trying to get my head around whether this is a > tired old idea or worth pursuing... > > Sounds like you're describing citeseer - http://citeseerx.ist.psu.edu/ - it's a combination bibliographic and citation index for computer science literature. It includes a good degree of citation analysis. Incredibly useful tool. Funding from NSF, NASA, Microsoft Research. Initially developed at NEC Research Institute, then moved to Penn. State. Code is available at sourceforge under an Apache license (find the link on the above cited page). From the history page: --- CiteSeer was the first digital library and search engine to provide automated citation indexing and citation linking using the method of autonomous citation indexing <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.17.1607>. CiteSeer was developed in 1997 at the NEC Research Institute, Princeton, New Jersey, by Steve Lawrence <http://labs.google.com/people/lawrence/>, Lee Giles <http://clgiles.ist.psu.edu> and Kurt Bollacker <http://en.wikipedia.org/wiki/Kurt_Bollacker>. The service transitioned to the Pennsylvania State University's College of Information Sciences and Technology in 2003. Since then, the project has been led by Lee Giles with technical and administrative direction by Isaac Councill <http://www.personal.psu.edu/%7Eigc2>. After serving as a public search engine for nearly ten years, CiteSeer, originally intended as a prototype only, began to scale beyond the capabilities of its original architecture. Since its inception, the original CiteSeer grew to index over 750,000 documents and served over 1.5 million requests daily, pushing the limits of the system's capabilities. Based on an analysis of problems encountered by the original system and the needs of the research community, a new architecture and data model was developed for the "Next Generation CiteSeer," or CiteSeer^x , in order to continue the CiteSeer legacy into the foreseeable future. --- Miles Fidelman -- In theory, there is no difference between theory and practice. In<fnord> practice, there is. .... Yogi Berra