logo       

Re: output html as lower case?: msg#00035

java.enhydra.xmlc

Subject: Re: output html as lower case?


Another update.  I'm working with some of the Jaxen developers who have come up with a DocumentNavigator for the HTML DOM which allows for defining the case that you prefer the XPath to be evaluated.  It does this by forcing node names to the case you choose.  For instance, the following will work on the HTML DOM...

XPath query = new HTMLXPath("/html/head/title");

...where the following will not...

XPath query = new DOMXPath("/html/head/title");

The reason is that HTMLXPath, by default, forces element names to lower case even though they are stored in upper case in the HTML DOM.  One can also override the default by using another constructor...

XPath query = new HTMLXPath("/HTML/HEAD/TITLE", false);

...which is entirely equivalent to....

XPath query = new DOMXPath("/HTML/HEAD/TITLE");

The boolean constructor of HTMLXPath allows you specify "true" (the default) for lower case elements and "false" for upper case elements.  With HTMLXPath, you can now use the same XPath _expression_ on both HTML and XHTML documents (or XML documents, for that matter).  XHTML and, hence, XML documents won't have the case conversion done to them to avoid breaking the XHTML spec where all elements are required to be lower case.

This should soon be in the Jaxen CVS.  I plan to use a build of Jaxen supporting HTMLXPath for the XPath demo in XMLC-2.2.

Jake

At 12:18 AM 6/25/2003 -0500, you wrote:

Just a little update.  I asked on the Jaxen-interest list about treating the HTML DOM in a case-insensitive manner.  The one response I got made it sound like it isn't supported currently, but there is at least one (ugly) workaround.

Given the following XPath _expression_ which would be valid for XHTML....

"/html/head/title"

would have to be modified to the following for HTML

"/HTML/HEAD/TITLE"

That is, unless one uses a nifty trick (some might call it a hack)

"/*[name()='html' or name()='HTML']/*[name()='head' or name()='HEAD']/*[name()='title' or name()='TITLE']"


I actually tested that with Jaxen and it works.  However, JXPath returns nothing.  Jaxen is much better about following the spec as far as I can tell and if we are moving to DOM4J in XMLC-3.0, we will get Jaxen support out-of-the-box, since that is what DOM4J's XPath implementation is based upon.

Anyway, I'll keep on the Jaxen guys to support the HTML DOM in a case-insensitive way.

Jake

At 11:24 AM 6/24/2003 -0500, you wrote:
At 08:33 AM 6/24/2003 -0700, you wrote:
Richard Kunze <kunze@xxxxxxxxx> writes:
> > but if the former, then this change would be very, very nice to have.
>
> Lets put it on the agenda for XMLC 3.0. A new DOM implementation is pretty
> much a given for XMLC 3.0 anyway, so we can just as well include case
> insensitive XPath support for HTML.

Guys, this will break tons of code and it will no longer be W3C HTML
compatiable DOM.  This is a either a bug or missing feature in
XPath, depending on your point of view...  Submit a patch to em, the logic
for checking for an HTML doc is easy:

    if (node.getDocument() instanceof org.w3c.dom.html.HTMLDocument) { 
        .....
    }

Agreed.  I'll ask the guys at Jaxen (used in DOM4j and elsewhere) and JXPath what their take on this issue is.  The XPath engine is probably where the case-sensitivity issues should be addressed rather than storing a non-standard DOM.

Jake

_______________________________________________
XMLC mailing list
XMLC@xxxxxxxxxxx
http://www.enhydra.org/mailman/listinfo.cgi/xmlc
<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise