I have librarians that would kill for this. In fact I was talking to
one about it the other day. She felt there must be a way to handle
active reading and make it portable. This would be great in
conjunction with RefWorks or Zotero or something along those lines.
On Tue, Sep 15, 2009 at 9:31 AM, Eric Lease Morgan <[log in to unmask]> wrote:
> I have been having fun recently indexing PDF files.
> For the pasts six months or so I have been keeping the articles I've read in
> a pile, and I was rather amazed at the size of the pile. It was about a foot
> tall. When I read these articles I "actively" read them -- meaning, I write,
> scribble, highlight, and annotate the text with my own special notation
> denoting names, keywords, definitions, citations, quotations, list items,
> examples, etc. This active reading process: 1) makes for better
> comprehension on my part, and 2) makes the articles easier to review and
> pick out the ideas I thought were salient. Being the librarian I am, I
> thought it might be cool ("kewl") to make the articles into a collection.
> Thus, the beginnings of Highlights & Annotations: A Value-Added Reading
> The techno-weenie process for creating and maintaining the content is
> something this community might find interesting:
> 1. Print article and read it actively.
> 2. Convert the printed article into a PDF
> file -- complete with embedded OCR --
> with my handy-dandy ScanSnap scanner. 
> 3. Use MyLibrary to create metadata (author,
> title, date published, date read, note,
> keywords, facet/term combinations, local
> and remote URLs, etc.) describing the
> article. 
> 4. Save the PDF to my file system.
> 5. Use pdttotext to extract the OCRed text
> from the PDF and index it along with
> the MyLibrary metadata using Solr. [3, 4]
> 6. Provide a searchable/browsable user
> interface to the collection through a
> mod_perl module. [5, 6]
> Software is never done, and if it were then it would be called hardware.
> Accordingly, I know there are some things I need to do before I can truely
> deem the system version 1.0. At the same time my excitment is overflowing
> and I thought I'd share some geekdom with my fellow hackers. Fun with PDF
> files and open source software.
>  ScanSnap - http://tinyurl.com/oafgwe
>  MyLibrary screen dump - http://infomotions.com/tmp/mylibrary.png
>  pdftotext - http://www.foolabs.com/xpdf/
>  Solr - http://lucene.apache.org/solr/
>  module source code - http://infomotions.com/highlights/Highlights.pl
>  user interface - http://infomotions.com/highlights/highlights.cgi
> Eric Lease Morgan
> University of Notre Dame
> Eric Lease Morgan
> Head, Digital Access and Information Architecture Department
> Hesburgh Libraries, University of Notre Dame
> (574) 631-8604