Print

Print


Thanks.  I'll see if this helps. 

I'm sure IE was used to view the files 4.5 years ago. I don't think I looked at them, but we had super employees (recent grads from library school) that worked with the files and I trust that they would have noticed problems.  

Fortunately we only have 7 of these to try to fix. 

Wendy

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Jon Gorman
Sent: Monday, December 09, 2013 3:17 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] problem in old etd xml files

A lot of modern systems won't load entities (or will limit it somehow) because of the denial of service attack that is possible.  Look for XML Entity Reference Denial of Service. I can't remember if Public declarations are treated any differently than System ones. (I would have suspected it to trust SYSTEM ones more, but they'd still be exploitable by the same bug).


(There's also a fair number of other errors, I'm somewhat skeptical that the example worked on many browsers even then. It's possible IE was flexible enough it would have worked).

One thing you might want to do is is take out the entities.

I can't remember why I had to do this, but xmllint seemed to do the trick.
( I found a snippet at
http://stackoverflow.com/questions/614067/how-to-resolve-all-entity-references-in-xml-and-create-a-new-xml-in-c,
but it' smissing the necessary --loaddtd)

xmllint --loaddtd --noent --dropdtd FRONT.xml > FRONT_nodtdent.xml

I mean, you don't need the dtd for validation, particularly since I suspect given the errors it may not validate anyhow.

It might make the files a little harder to read when reading the raw source, but I suspect that's not typically a problem.

Jon Gorman
University of Illinois



On Mon, Dec 9, 2013 at 2:10 PM, Robertson, Wendy C < [log in to unmask]> wrote:

> Back in 1999-2002 a handful of our theses were submitted  as a 
> collection of xml files.  We posted the files in our repository 
> several years ago (we posted a zipped folder with all the files).  At 
> that time, if you opened front.xml you would be able to access the 
> thesis. We have not touched the files in the close to 5 years since we 
> posted them, but the files no longer open correctly. One of the problem theses is http://ir.uiowa.edu/etd/189/.
>
> Front.xml begins
> <?xml version="1.0" encoding="UTF-8"?> <?xml:stylesheet 
> type="text/css" href="UIowa2K1.css" ?> <!DOCTYPE thesis SYSTEM 
> "UIowa2K.dtd">
>
> I have tried the following changes but they do not help
>
> 1)      Adding standalone="no"? to the xml declaration  -- <?xml
> version="1.0" " encoding="UTF-8" standalone="no"?>
>
> 2)      Changing the case of "UIowa2K1.css" and "UIowa2K.dtd" to match the
> files (which are in all caps)
>
> 3)      Changing xml:stylesheet to xml-stylesheet
>
> Chrome shows errors that entities are not defined, but they are 
> defined in the dtd.
>
> I would appreciate any assistance in making these documents available 
> again. Thanks!
>
> Wendy Robertson
> Digital Scholarship Librarian *  The University of Iowa Libraries
> 1015 Main Library  *  Iowa City, Iowa 52242 [log in to unmask] 
> * 319-335-5821
>