Hi Andrew,
EntityResolverWrapper is a wrapper for org.xml.sax.EntityResolver. The
system ID passed to EntityResolver.resolveEntity() is the "expanded system
ID". Specifically the docs for resolveEntity() [1] say: "if the system
identifier is a URL, the SAX parser must resolve it fully before reporting
it to the application" and that's exactly what the parser does. The other
wrapper is for EntityResolver2 [2] whose resolveEntity() methods takes the
literal system ID along with a base URI, so yes the two resolvers behave
differently. Xerces has a utility class called
org.apache.xerces.util.XMLCatalogResolver which uses the XML commons
catalog resolver. You may want to have a look at it.
Hope that helps.
[1]
http://www.saxproject.org/apidoc/org/xml/sax/EntityResolver.html#resolveEntity(java.lang.String,%20java.lang.String)
[2]
http://www.saxproject.org/apidoc/org/xml/sax/ext/EntityResolver2.html#resolveEntity(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String)
"Andrew Stevens" <ats37@xxxxxxxxxxx> wrote on 02/28/2005 08:47:52 AM:
> In
> org.apache.xerces.util.EntityResolverWrapper.
> resolveEntity(XMLResourceIdentifier
> resourceIdentifier), it has the line
> String sysId = resourceIdentifier.getExpandedSystemId();
> Is there some particular reason this uses the expanded system ID rather
than
> using getLiteralSystemId()?
>
> I've got a problem with some XML files I'm processing with Cocoon. The
> files all contain a DOCTYPE that uses a relative path for the system ID
i.e.
> <!DOCTYPE record SYSTEM "dcr4.5.dtd"> The documents are created by an
> another application, and I can't affect what it puts in there. Trying
to
> read the files generates a parser error since the DTD isn't present in
the
> directory containing the documents; no problem, I thought, just use a
> suitable entry in the catalog used by Cocoon's EntityResolver. So,
> following the other entries, I added
> SYSTEM "dcr4.5.dtd" "interwoven/dcr4.5.dtd"
> and copied the DTD into WEB-INF\entities\interwoven, however, it still
> doesn't find the DTD. Turning up the logging (and this is where it
becomes
> more relevant to Xerces than Cocoon, and why I'm asking here rather than
> cocoon-user) I discovered that the system ID being passed in to the
catalog
> resolver already had the full path to the file, so it's not matching the
> above entry in the catalog. Since the path to the documents could be
more
> or less anything, I can't use a (prefix-based) rewrite entry in the
catalog;
> likewise it's impractical to include a system entry for every possible
path,
> since I don't know in advance what they're going to be. Digging through
the
> Cocoon & Xerces source code, I discovered the path being received by the
> catalog resolver has come from the EntityResolverWrapper i.e. the
> resourceIdentifier.getExpandedSystemId() I mentioned above. Presumably,
if
> that had used getLiteralSystemId() instead, the catalog resolver would
have
> received just "dcr4.5.dtd" for the system ID rather than the full path,
and
> would have matched it okay. But I'm wary of changing it myself, since I
> don't know what else might be affected (and I'd rather avoid using a
> custom-built Xerces in our Cocoon app, to minimise the risk of
introducing
> other side-effects).
>
> I notice in the current CVS HEAD, there's an EntityResolver2Wrapper
class;
> this one does use getLiteralSystemId(), in fact the latest CVS log
message
> on that class says
> "Fixing a bug. The systemId passed to EntityResolver2.resolveEntity may
be
> an absolute or relative URI. That is it should be the literal system
> identifier, not the expanded one which resolved from the base URI."
> However, I also found an old (> 2 years) mailing list message
> (http://mail-archives.apache.org/eyebrowse/ReadMsg?listName=xerces-
> j-user@xxxxxxxxxxxxxx&msgId=568021)
> which says that
> "The reason Xerces now returns fully-expanded URI's to the Entity
resolver
> is that SAX quite explicitly states that this is what XML processors are
> supposed to do."
> So now I'm twice as confused. Do the SAX2 Extensions 1.1 say that
> EntityResolver2 should behave differently from EntityResolver? Or have
> things changed since EntityResolverWrapper switched to using
> getExpandedSystemId(), and should it now be using getLiteralSystemId()
after
> all?
>
> In the meantime I can work around my problem by plugging in a custom
> EntityResolver which replaces any system IDs ending with "dcr4.5.dtd"
with
> just that string, before passing it on to the XML commons catalog
resolver
> as before. But it'd be nice if it could be clarified how exactly
Xerces'
> wrapper classes are supposed to work, so I know if I should be raising a
bug
> :-)
>
>
> Andrew.
> --
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xxxxxxxxxxxxxx
> For additional commands, e-mail: xerces-j-user-help@xxxxxxxxxxxxxx
>
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@xxxxxxxxxx
E-mail: mrglavas@xxxxxxxxxx
|