Neither vector nor raster information describes the actual embedded
_text_ we're talking about though. The stuff that lets you
copy-and-paste _text_ (not images), or search text. PDFs can also have
that. And even know what portions of a raster displayed image correspond
to what characters.
text characters in a PDF aren't vector images, they're actually
character bytes encoded with some encoding such as utf-8.
On 4/28/2011 5:00 PM, Carl Wiedemann wrote:
> I should also remark that vector information and raster information may
> exist in the same PDF file. For example, a PDF of a magazine or newspaper
> will probably vector text and column borders while photography will be
> raster at ~300dpi.
> On Thu, Apr 28, 2011 at 2:58 PM, Carl Wiedemann<[log in to unmask]>wrote:
>> Generally PDFs are capable of displaying two types of information: Vector
>> and Raster.
>> Vector information is composed of lossless data that describes points,
>> smooth lines, gradients, and curves. Vector information is lossless and has
>> no native resolution, it can be infinitely scaled. Text data is understood
>> as vector information if we were to regard textual documents as images.
>> Generally, when composing a document in a word processor and printing it to
>> a PDF results in the text as actual vector shapes -- you can zoom-in on the
>> text as much as you'd like. PDF readers understand this information as
>> native text you can select the text with a cursor, search the text, and
>> copy/paste. Other formats like SVG and ESP generally express vector
>> Raster information is composed of pixels. JPEG, PNG, GIF, BMP, TIFF are
>> examples of raster information. These have a definite resolution, and, from
>> a computing perspective, are just a bunch of dots. When you scan an image
>> (or a document), it is digitally translated a raster. Digital photographs
>> are raster. There are some techniques using Optical Character Recognition
>> (OCR) which can actually recognize characters in a raster image and
>> transform them into text data. There are also procedures to do a "bitmap
>> trace" to attempt to create vector information from a raster image.
>> More info here
>> On Thu, Apr 28, 2011 at 11:10 AM, Van Mil, James (vanmiljf)<
>> [log in to unmask]> wrote:
>>> I often employ the word 'raster', along with some other foul language, for
>>> any PDFs that don't have manipulate-able text.
>>> -----Original Message-----
>>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
>>> Keith Jenkins
>>> Sent: Thursday, April 28, 2011 1:06 PM
>>> To: [log in to unmask]
>>> Subject: Re: [CODE4LIB] What's the descriptive technical terminology?...
>>> pdf image of a page. pdf format used with cut paste.
>>> I've also heard many people use the term "searchable PDF" for a text-based
>>> On Thu, Apr 28, 2011 at 12:43 PM, Peter Murray<[log in to unmask]>
>>>> That is the same terminology I use as well -- image-based versus
>>> text-based. I find that works most times because people can visually see if
>>> something looks like a scanned image.