If you have the raw text of the books or reference articles that you
are interested in, you might try looking at our tool, ParsCit (already
mentioned and known to a few people on the list), to extract and
format the reference strings in a bibliography.
Both CiteSeerX and Mendeley use this software (in part) as part of
their solution. You can use this software too, since it's open source
and still being actively maintained by our group at NUS.
We use this software as part of an ongoing project to visualize
scholarly networks in humanities and (in a separate project) pure
Min-Yen KAN (Dr) :: Associate Professor :: National University of
Singapore :: NUS School of Computing, AS6 05-12, 13 Computing Drive
Singapore 117417 :: 65-6516 1885(DID) :: 65-6779 4580 (Fax) ::
[log in to unmask] (E) :: www.comp.nus.edu.sg/~kanmy (W)
Important: This email is confidential and may be privileged. If you
are not the intended recipient, please delete it and notify us
immediately; you should not copy or use it for any purpose, nor
disclose its contents to any other person. Thank you.
On Fri, Nov 18, 2011 at 9:51 AM, Bill Dueber <[log in to unmask]> wrote:
> If I'm understanding you correctly, you're describing citation analysis
> (sometimes referred to as a part of bibliometrics). It is mostly applied to
> article data (e.g, the web of science / web of knowledge at ISI) but there
> are zillions of studies looking at co-citation and co-authorship networks,
> the long tail of cited works and authors, etc. You can hardly shake a stick
> at JASIS&T without hitting two or three of these studies.
> As you're probably already thinking, getting a hold of the citation
> information in a machine-readable format is the painful part. Things are
> made harder by your desire to work with books, since many citation are to
> individual chapters for edited works, and (of course) books just plain
> aren't generally available digitally.
> Article searches (in google scholar or your local academic library) for
> "bibliometrics" or "citation analysis" should get you started on past and
> future work.
> On Thu, Nov 17, 2011 at 12:47 PM, Joe Hourcle <[log in to unmask]
>> On Nov 17, 2011, at 12:09 PM, Miles Fidelman wrote:
>> > Matt Amory wrote:
>> >> Is anyone involved with, or does anyone know of any project to extract
>> >> aggregate bibliography data from individual works to produce some kind
>> >> "most-cited" authors list across a collection?
>> >> or historic?
>> >> Sorry to be vague, but I'm trying to get my head around whether this is
>> >> tired old idea or worth pursuing...
>> > Sounds like you're describing citeseer - http://citeseerx.ist.psu.edu/- it's a combination bibliographic and citation index for computer science
>> literature. It includes a good degree of citation analysis. Incredibly
>> useful tool.
>> Another recent project (that I haven't had a chance to play with yet) is
>> Total Impact :
>> It's from some of the folks in altmetrics, who are trying to find better
>> bibliometrics for measuring value:
>> I don't see a list of what they're scraping I think they're using the
>> publisher's indexes, PubMed and other databases rather than parsing the
>> text themselves ... but the software's available, if you wanted to take a
>> look. Or you could just ask Heather or Jason, they're both approachable
>> and always eager to talk, when I've run into them at meetings.
>> I also seem to remember someone at the DataCite meeting this summer who
>> was involved in a project to parse references in papers ... unfortunately,
>> I don't have that notebook here to check ... but I *think* it was John
>> Kunze. (and I don't think it was part of the person's presentation, but
>> something that I had picked up in the Q/A part)
> Bill Dueber
> Library Systems Programmer
> University of Michigan Library