Print

Print


Eric,

I have librarians that would kill for this.  In fact I was talking to
one about it the other day.  She felt there must be a way to handle
active reading and make it portable.  This would be great in
conjunction with RefWorks or Zotero or something along those lines.

Rosalyn



On Tue, Sep 15, 2009 at 9:31 AM, Eric Lease Morgan <[log in to unmask]> wrote:
> I have been having fun recently indexing PDF files.
>
> For the pasts six months or so I have been keeping the articles I've read in
> a pile, and I was rather amazed at the size of the pile. It was about a foot
> tall. When I read these articles I "actively" read them -- meaning, I write,
> scribble, highlight, and annotate the text with my own special notation
> denoting names, keywords, definitions, citations, quotations, list items,
> examples, etc. This active reading process: 1) makes for better
> comprehension on my part, and 2) makes the articles easier to review and
> pick out the ideas I thought were salient. Being the librarian I am, I
> thought it might be cool ("kewl") to make the articles into a collection.
> Thus, the beginnings of Highlights & Annotations: A Value-Added Reading
> List.
>
> The techno-weenie process for creating and maintaining the content is
> something this community might find interesting:
>
>  1. Print article and read it actively.
>
>  2. Convert the printed article into a PDF
>    file -- complete with embedded OCR --
>    with my handy-dandy ScanSnap scanner. [1]
>
>  3. Use MyLibrary to create metadata (author,
>    title, date published, date read, note,
>    keywords, facet/term combinations, local
>    and remote URLs, etc.) describing the
>    article. [2]
>
>  4. Save the PDF to my file system.
>
>  5. Use pdttotext to extract the OCRed text
>    from the PDF and index it along with
>    the MyLibrary metadata using Solr. [3, 4]
>
>  6. Provide a searchable/browsable user
>    interface to the collection through a
>    mod_perl module. [5, 6]
>
> Software is never done, and if it were then it would be called hardware.
> Accordingly, I know there are some things I need to do before I can truely
> deem the system version 1.0. At the same time my excitment is overflowing
> and I thought I'd share some geekdom with my fellow hackers. Fun with PDF
> files and open source software.
>
>
> [1] ScanSnap - http://tinyurl.com/oafgwe
> [2] MyLibrary screen dump - http://infomotions.com/tmp/mylibrary.png
> [3] pdftotext - http://www.foolabs.com/xpdf/
> [4] Solr - http://lucene.apache.org/solr/
> [5] module source code - http://infomotions.com/highlights/Highlights.pl
> [6] user interface - http://infomotions.com/highlights/highlights.cgi
>
> --
> Eric Lease Morgan
> University of Notre Dame
>
>
>
>
> --
> Eric Lease Morgan
> Head, Digital Access and Information Architecture Department
> Hesburgh Libraries, University of Notre Dame
>
> (574) 631-8604
>