Print

Print


Matt Amory wrote:
> Is anyone involved with, or does anyone know of any project to extract and
> aggregate bibliography data from individual works to produce some kind of
> "most-cited" authors list across a collection?  Local/Network/Digital/OCLC
> or historic?
>
> Sorry to be vague, but I'm trying to get my head around whether this is a
> tired old idea or worth pursuing...
>
>
Sounds like you're describing citeseer - http://citeseerx.ist.psu.edu/ - 
it's a combination bibliographic and citation index for computer science 
literature.  It includes a good degree of citation analysis.  Incredibly 
useful tool.

Funding from NSF, NASA, Microsoft Research.  Initially developed at NEC 
Research Institute, then moved to Penn. State.  Code is available at 
sourceforge under an Apache license (find the link on the above cited 
page).

 From the history page:

---
CiteSeer was the first digital library and search engine to provide 
automated citation indexing and citation linking using the method of 
autonomous citation indexing 
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.17.1607>.

CiteSeer was developed in 1997 at the NEC Research Institute, Princeton, 
New Jersey, by Steve Lawrence <http://labs.google.com/people/lawrence/>, 
Lee Giles <http://clgiles.ist.psu.edu> and Kurt Bollacker 
<http://en.wikipedia.org/wiki/Kurt_Bollacker>. The service transitioned 
to the Pennsylvania State University's College of Information Sciences 
and Technology in 2003. Since then, the project has been led by Lee 
Giles with technical and administrative direction by Isaac Councill 
<http://www.personal.psu.edu/%7Eigc2>.

After serving as a public search engine for nearly ten years, CiteSeer, 
originally intended as a prototype only, began to scale beyond the 
capabilities of its original architecture. Since its inception, the 
original CiteSeer grew to index over 750,000 documents and served over 
1.5 million requests daily, pushing the limits of the system's 
capabilities. Based on an analysis of problems encountered by the 
original system and the needs of the research community, a new 
architecture and data model was developed for the "Next Generation 
CiteSeer," or CiteSeer^x , in order to continue the CiteSeer legacy into 
the foreseeable future.
---

Miles Fidelman





-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra