LISTSERV 16.5 - CODE4LIB Archives

Carl Wiedemann <[log in to unmask]> wrote:

>I should also remark that vector information and raster information may
>exist in the same PDF file. For example, a PDF of a magazine or newspaper
>will probably vector text and column borders while photography will be
>raster at ~300dpi.
>
>
>On Thu, Apr 28, 2011 at 2:58 PM, Carl Wiedemann <[log in to unmask]>wrote:
>
>> Generally PDFs are capable of displaying two types of information: Vector
>> and Raster.
>>
>> Vector information is composed of lossless data that describes points,
>> smooth lines, gradients, and curves. Vector information is lossless and has
>> no native resolution, it can be infinitely scaled. Text data is understood
>> as vector information if we were to regard textual documents as images.
>> Generally, when composing a document in a word processor and printing it to
>> a PDF results in the text as actual vector shapes -- you can zoom-in on the
>> text as much as you'd like. PDF readers understand this information as
>> native text you can select the text with a cursor, search the text, and
>> copy/paste. Other formats like SVG and ESP generally express vector
>> information.
>>
>> Raster information is composed of pixels. JPEG, PNG, GIF, BMP, TIFF are
>> examples of raster information. These have a definite resolution, and, from
>> a computing perspective, are just a bunch of dots. When you scan an image
>> (or a document), it is digitally translated a raster. Digital photographs
>> are raster. There are some techniques using Optical Character Recognition
>> (OCR) which can actually recognize characters in a raster image and
>> transform them into text data. There are also procedures to do a "bitmap
>> trace" to attempt to create vector information from a raster image.
>>
>> More info here
>> http://en.wikipedia.org/wiki/Vector_graphics
>> http://en.wikipedia.org/wiki/Raster_graphics
>>
>>
>>
>>
>> On Thu, Apr 28, 2011 at 11:10 AM, Van Mil, James (vanmiljf) <
>> [log in to unmask]> wrote:
>>
>>> I often employ the word 'raster', along with some other foul language, for
>>> any PDFs that don't have manipulate-able text.
>>>
>>> -James
>>>
>>> -----Original Message-----
>>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
>>> Keith Jenkins
>>> Sent: Thursday, April 28, 2011 1:06 PM
>>> To: [log in to unmask]
>>> Subject: Re: [CODE4LIB] What's the descriptive technical terminology?...
>>> pdf image of a page. pdf format used with cut paste.
>>>
>>> I've also heard many people use the term "searchable PDF" for a text-based
>>> PDF.
>>>
>>> Keith
>>>
>>>
>>> On Thu, Apr 28, 2011 at 12:43 PM, Peter Murray <[log in to unmask]>
>>> wrote:
>>> > That is the same terminology I use as well -- image-based versus
>>> text-based. I find that works most times because people can visually see if
>>> something looks like a scanned image.
>>>
>>
>>