Generally PDFs are capable of displaying two types of information: Vector
Vector information is composed of lossless data that describes points,
smooth lines, gradients, and curves. Vector information is lossless and has
no native resolution, it can be infinitely scaled. Text data is understood
as vector information if we were to regard textual documents as images.
Generally, when composing a document in a word processor and printing it to
a PDF results in the text as actual vector shapes -- you can zoom-in on the
text as much as you'd like. PDF readers understand this information as
native text you can select the text with a cursor, search the text, and
copy/paste. Other formats like SVG and ESP generally express vector
Raster information is composed of pixels. JPEG, PNG, GIF, BMP, TIFF are
examples of raster information. These have a definite resolution, and, from
a computing perspective, are just a bunch of dots. When you scan an image
(or a document), it is digitally translated a raster. Digital photographs
are raster. There are some techniques using Optical Character Recognition
(OCR) which can actually recognize characters in a raster image and
transform them into text data. There are also procedures to do a "bitmap
trace" to attempt to create vector information from a raster image.
More info here
On Thu, Apr 28, 2011 at 11:10 AM, Van Mil, James (vanmiljf) <
[log in to unmask]> wrote:
> I often employ the word 'raster', along with some other foul language, for
> any PDFs that don't have manipulate-able text.
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Keith Jenkins
> Sent: Thursday, April 28, 2011 1:06 PM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] What's the descriptive technical terminology?...
> pdf image of a page. pdf format used with cut paste.
> I've also heard many people use the term "searchable PDF" for a text-based
> On Thu, Apr 28, 2011 at 12:43 PM, Peter Murray <[log in to unmask]>
> > That is the same terminology I use as well -- image-based versus
> text-based. I find that works most times because people can visually see if
> something looks like a scanned image.