Thanks, Daron. We don't actually run OCR over the cover page PDF. We just save it in Word as an Adobe PDF, then append to the existing document, which is already in PDF format.
We also sometimes create cover sheets via a batch process using MS Word; again, just saving in default Adobe PDF format.
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Daron Dierkes
Sent: Wednesday, 26 November 2014 2:29 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] Cover pages and Google
Perhaps it depends on how you are generating PDFs. If it is straight acrobat, then it should be as easy as making a PDF of all but the cover, running OCR, then adding the cover in as another page. As long as you do not generate OCR again, the added pages should stay image only. I haven't tried it, but I'm pretty sure that's possible.
If it is a question specific to your repository architecture then it might be harder.
On Tuesday, November 25, 2014, Dan Scott <[log in to unmask]> wrote:
> Could you provide some examples of the resources that you're excluding
> and searches that return those results (maybe with screen shots in
> case Google serves up different results to different users)? I'm
> having a bit of trouble understanding your problem description.
> I'll admit that my schema.org hammer is itchy, but I don't want to
> jump to conclusions as the problem might not even be a construction
> issue, let alone a nail :) On 24 Nov 2014 22:57, "Bernadette Houghton"
> > We've discovered that cover pages we add to items in our research
> > repository have the unwelcome side effect of causing Google to
> > display
> > cover page citation in search results, rather than the intro or preface.
> > The problem doesn't occur in Google Scholar, just the main Google
> > search engine.
> > One way to avoid this problem is to have the cover page formatted as
> > an image PDF rather than a text-readable PDF. Can anyone recommend a
> > that will convert a text-readable PDF to an image PDF??
> > TIA
> > Bernadette Houghton
> > Digitisation and Preservation Librarian Library
> > [Title: Deakin University logo]
> > Deakin University
> > Locked Bag 20000, Geelong, VIC 3220
> > +61 3 52278230
> > >
> > www.deakin.edu.au<http://www.deakin.edu.au/>
> > Deakin University CRICOS Provider Code 00113B
> > Important Notice: The contents of this email are intended solely for
> > the named addressee and are confidential; any unauthorised use,
> > reproduction
> > storage of the contents is expressly prohibited. If you have
> > received
> > email in error, please delete it and any attachments immediately and
> > the sender by return email or telephone.
> > Deakin University does not warrant that this email and any
> > attachments
> > error or virus free.
Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are error or virus free.