Print

Print


Right, hence my earlier suggestion of just replacing the entities ;). It's
not exactly the approach you describe, as your would would deal with common
cases that didn't get properly set up in the dtd, but it also would be a
bit more difficult to map for weird custom entities.

My email was a bit rambling, but the magic sauce I recommended was
something like

xmllint --loaddtd --noent --dropdtd FRONT.XML > FRONT_nodtdent.xml

(In reality you'd want to automate that a little more, xmllint uses the
libxml libraries if I remember correctly, so there are likely bindings that
do the same thing.)

What that seems to do is loads the dtd (which xmllint no longer does unless
it needs to), takes any entity and replaces it with what's in the dtd, and
then just drops the dtd. I didn't look closely, but it doesn't seem to just
transplant it with the numeric code (ÿ), but use the actual unicode
character.

(You still need to fix the several mistakes that have already been observed
and pointed out by folks like Jason, the xml:stylesheet that needs to be
xml-stylesheet, making sure the filename are actually correct for
case-sensitive OSes.)

Jon G.


On Mon, Dec 9, 2013 at 7:48 PM, Roy Tennant <[log in to unmask]> wrote:

> For my money, the text transform should look only for exact matches (e.g.,
> "&aacute;", "&nbsp;", "&copy;") and replace them with their numeric
> counterparts.
> Roy
>
>
> On Mon, Dec 9, 2013 at 5:41 PM, jason bengtson <[log in to unmask]
> >wrote:
>
> > For testing purposes I just nixed them. As I noted, to rework the file a
> > person would probably want to use a more critical eye with find and
> > replace. Totally doable.
> >
> >
> > On Dec 9, 2013, at 7:37 PM, Jon Gorman <[log in to unmask]>
> wrote:
> >
> > > How did you fix the ampersands? I ask, because if you just did a simple
> > > text transform from & to &amp;, it would mask the problem of the entity
> > > escaping I think...
> > >
> > > Not at work, so I don't have a good example and the file is downloading
> > > very slowly here, so I'll try to do one from memory.
> > >
> > > There were several &aacute; in the XML which mapped to an accent
> > character
> > > in the DTD via the Entity.
> > >
> > > If you just substituted & with &amp;, you'd get &amp;aacute;, which
> would
> > > render inline as &accute;. It would superficially solve the issue since
> > > browsers would no longer give the errors about the dtd since it
> wouldn't
> > be
> > > trying to load entities from the DTDs. And depending how you did it,
> you
> > > likely could also replace a correctly encoded one to make &amp;amp;,
> > > leading to some very odd stuff.
> > >
> > > I wouldn't be surprised to find some unescaped ampersands, but the
> > solution
> > > I posted will essentially replace the entities with their text,
> hopefully
> > > causing most characters to appear correctly. You definitely still need
> to
> > > fix some of the other stuff. (I suspect it never worked for most
> browsers
> > > and XML systems, most likely only IE).
> > >
> > > Jon Gorman
> > > University of Illinois
> >
> > Best regards,
> >
> > Jason Bengtson, MLIS, MA
> > Head of Library Computing and Information SystemsAssistant Professor,
> > Graduate CollegeDepartment of Health Sciences Library and Information
> > ManagementUniversity of Oklahoma Health Sciences Center405-271-2285, opt.
> > 5405-271-3297 (fax)
> > [log in to unmask]
> > http://library.ouhsc.edu
> > www.jasonbengtson.com
> >
> > NOTICE:
> > This e-mail is intended solely for the use of the individual to whom it
> is
> > addressed and may contain information that is privileged, confidential or
> > otherwise exempt from disclosure. If the reader of this e-mail is not the
> > intended recipient or the employee or agent responsible for delivering
> the
> > message to the intended recipient, you are hereby notified that any
> > dissemination, distribution, or copying of this communication is strictly
> > prohibited. If you have received this communication in error, please
> > immediately notify us by replying to the original message at the listed
> > email address. Thank You.
> >
>