Print

Print


Bernadette,

This issue was the topic of a paper from the University of Bath last year - see http://opus.bath.ac.uk/34033/.

One outcome from this was a conversation with EPrints to make sure that metadata was not affected by a coversheet and therefore shouldn't affect Google crawling.  So it seems to be something that can be addressed according to how the underlying system is treating the coversheet (as you have noted by making it an image instead).

Regards,

Chris
On 27 Nov 2014, at 04:00, CODE4LIB automatic digest system wrote:


Date:    Wed, 26 Nov 2014 05:06:13 +0000
From:    Bernadette Houghton <[log in to unmask]<mailto:[log in to unmask]>>
Subject: Re: Cover pages and Google

Dan, here's an example search result returned by Google Scholar:

https://dl.dropboxusercontent.com/u/29347274/google.jpg

The "This is the authors' final..." text comes from the PDF. Ideally, the text would be the article title.

Regards
Bern

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Dan Scott
Sent: Wednesday, 26 November 2014 11:54 AM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: [CODE4LIB] Cover pages and Google

Could you provide some examples of the resources that you're excluding and searches that return those results (maybe with screen shots in case Google serves up different results to different users)? I'm having a bit of trouble understanding your problem description.

I'll admit that my schema.org<http://schema.org/> hammer is itchy, but I don't want to jump to conclusions as the problem might not even be a construction issue, let alone a nail :) On 24 Nov 2014 22:57, "Bernadette Houghton" < [log in to unmask]<mailto:[log in to unmask]>> wrote:

We've discovered that cover pages we add to items in our research
repository have the unwelcome side effect of causing Google to display
the cover page citation in search results, rather than the intro or preface.
The problem doesn't occur in Google Scholar, just the main Google
search engine.

One way to avoid this problem is to have the cover page formatted as
an image PDF rather than a text-readable PDF. Can anyone recommend a
software that will convert a text-readable PDF to an image PDF??

TIA

Bernadette Houghton
Digitisation and Preservation Librarian Library
[Title: Deakin University logo]
Deakin University
Locked Bag 20000, Geelong, VIC 3220
+61 3 52278230
[log in to unmask]<mailto:[log in to unmask]
u.au<http://u.au/>

www.deakin.edu.au<http://www.deakin.edu.au/>
Deakin University CRICOS Provider Code 00113B


Important Notice: The contents of this email are intended solely for
the named addressee and are confidential; any unauthorised use,
reproduction or storage of the contents is expressly prohibited. If
you have received this email in error, please delete it and any
attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments
are error or virus free.


Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.

------------------------------

Date:    Wed, 26 Nov 2014 05:22:34 +0000
From:    Bernadette Houghton <[log in to unmask]<mailto:[log in to unmask]>>
Subject: Re: Cover pages and Google

Thanks, Daron. We don't actually run OCR over the cover page PDF. We just save it in Word as an Adobe PDF, then append to the existing document, which is already in PDF format.

We also sometimes create cover sheets via a batch process using MS Word; again, just saving in default Adobe PDF format.

Bern

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Daron Dierkes
Sent: Wednesday, 26 November 2014 2:29 PM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: [CODE4LIB] Cover pages and Google

Perhaps it depends on how you are generating PDFs.  If it is straight acrobat, then it should be as easy as making a PDF of all but the cover, running OCR, then adding the cover in as another page.  As long as you do not generate OCR again, the added pages should stay image only.  I haven't tried it, but I'm pretty sure that's possible.

If it is a question specific to your repository architecture then it might be harder.




On Tuesday, November 25, 2014, Dan Scott <[log in to unmask]<mailto:[log in to unmask]>> wrote:

Could you provide some examples of the resources that you're excluding
and searches that return those results (maybe with screen shots in
case Google serves up different results to different users)? I'm
having a bit of trouble understanding your problem description.

I'll admit that my schema.org<http://schema.org/> hammer is itchy, but I don't want to
jump to conclusions as the problem might not even be a construction
issue, let alone a nail :) On 24 Nov 2014 22:57, "Bernadette Houghton"
< [log in to unmask]<mailto:[log in to unmask]> <javascript:;>> wrote:

We've discovered that cover pages we add to items in our research
repository have the unwelcome side effect of causing Google to
display
the
cover page citation in search results, rather than the intro or preface.
The problem doesn't occur in Google Scholar, just the main Google
search engine.

One way to avoid this problem is to have the cover page formatted as
an image PDF rather than a text-readable PDF. Can anyone recommend a
software
that will convert a text-readable PDF to an image PDF??

TIA

Bernadette Houghton
Digitisation and Preservation Librarian Library
[Title: Deakin University logo]
Deakin University
Locked Bag 20000, Geelong, VIC 3220
+61 3 52278230
[log in to unmask]<mailto:[log in to unmask]> <javascript:;><mailto:
[log in to unmask]<mailto:[log in to unmask]> <javascript:;>

www.deakin.edu.au<http://www.deakin.edu.au/>
Deakin University CRICOS Provider Code 00113B


Important Notice: The contents of this email are intended solely for
the named addressee and are confidential; any unauthorised use,
reproduction
or
storage of the contents is expressly prohibited. If you have
received
this
email in error, please delete it and any attachments immediately and
advise
the sender by return email or telephone.

Deakin University does not warrant that this email and any
attachments
are
error or virus free.



Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.