logo       

RE: [xmlc] xmlc2.3 include-ignorable-whitespace feature: msg#00019

java.enhydra.xmlc

Subject: RE: [xmlc] xmlc2.3 include-ignorable-whitespace feature

Quoting ʯöÎ <shixin129@xxxxxxx>:

> I have some html pages like this :
> ... <ul id="DemoList">
> <li></li>
> <li></li>
> </ul>
> ...and use "xmlcObject.getElementDemoList().getChildNodes()" to get all <li>
> elements,
> but in xmlc2.3 the "getChildNodes()" return a NodeList contains
> org.w3c.dom.Text Object.
> Now I use "xmlcObject.getElementDemoList().getElementsByTagName("li")"
> instead.
>

Your solution is more reliable than getChildNodes(). However, I want to explore
this a bit more. Read on...

Are you using the HTML DOM or the XHTML DOM? The "include-ignorable-whitespace"
feature applies *only* to the latter. Since HTML isn't validated, there's no
way for the parser to know what whitespace is ignorable. As such, the parser
makes no attempt to remove whitespace, because without the DTD telling it what
to remove, any attempt may remove important whitespace.

I think there might be some confusion here. You originally expressed concern
that "include-ignorable-whitespace" was "false" and wanted to be able to
configure it, presumably, to "true". However, based on your example above, you
are concerned that you are getting extra whitespace nodes in places where they
it's arguable that they ought not be. This is exactly what setting
"include-ignorable-whitespace" to "false" is for. It removes ignorable
whitespace. I think this is pretty much what one would want (and what you seem
to be trying to achieve) and don't see any benefit of making the feature
configurable.

My guess is that you are using the HTML DOM, not the XHTML DOM. If you upgraded
from XMLC-2.2.xx and, all of a sudden, began seeing extra Text nodes where they
didn't get created before, such between children of <ul>, <ol>, etc..., this is
because Xerces-1.4.4 (which is what XMLC-2.2.xx uses) strips whitespace from
HTML where Xerces2 (or NekoHTML) does not. IMO, Xerces2/NekoHTML is doing the
right thing and Xerces-1.4.4 is doing the wrong thing. Without a DTD to
validate against, Xerces-1.4.4 has no business in removing whitespace. For
instance, it might remove whitespace inside <pre> tags without a DTD to tell it
not to. The best way to avoid this problem is to use the XHTML DOM, which uses
the validating XML parser instead of the non-validating HTML parser.

That said, it is possible to mimick the include-ignorable-whitespace="false" in
the HTML parser if we are very careful about following the rules of the XHTML
1.0 Transitional DTD. If you would like to take a crack at it, take a look at
XercesHTMLDOMParser.java [1]. I even have a limited attempt that I commented
out. Look at the commented out characters() method. That method might
actually be correctly implemented as-is, but I wasn't 100% sure that it would
be correct, so I left it commented out. It could be uncommented in a future
release, but we'd have to be sure it isn't removing whitespace where it
shouldn't.


[1]
http://cvs.forge.objectweb.org/cgi-bin/viewcvs.cgi/xmlc/xmlc/xmlc/modules/xmlc/src/org/enhydra/xml/xmlc/parsers/xerces/XercesHTMLDOMParser.java

>
> XMLC is compatible with OSGi , XMLCObject can be easily uesed in OSGi HTTP
> Service or in Eclipse RCP , but jsp can't.
> I use XMLC with OSGi for a long time, it work very well.
>

I'm interested in this. Do you have an external references that can show me and
others how to integrate XMLC with OSGI. You're not obligated to, but if it
isn't too much trouble, it would be much appreciated.


> Sorry for my weak english.
>

Hey, no problem. You don't see me being able to speak Chinese, do you? You're
one big step ahead of me!

Jake

>
> Curry
>
>
>
> > Date: Thu, 24 May 2007 01:37:13 -0500> To: xmlc@xxxxxxxxxxxxx> From:
> hoju@xxxxxxxx> Subject: Re: [xmlc] xmlc2.3 include-ignorable-whitespace
> feature> > > Well, right now it isn't configurable, though it > could be
> added as an option in the metadata in > the future. Can you explain why you
> need > ignorable whitespace to be included? The DTD > defines whitespace as
> ignorable or not. Why > would it lie? Can I assume you are using > XHTML?
> Please describe what is getting broken so > I can better understand the
> problem. And are you > using XMLC's DOMFormatter to output your markup or
> some other mechanism?> > BTW, I'm curious, how are you using OSGI with XMLC?>
> > > Jake> > At 07:53 PM 5/23/2007, you wrote:> > >Hi,> >> > I upgrade my
> application (xmlc + osgi) to > > xmlc 2.3 , then I found that > >
> "include-ignorable-whitespace" feature default > > is false. I can't find how
> to configure.> >> > Who have a good idea?> >> >> > Curry> >> >> >> >>
> >----------> >ͨ¹ý Live.com > >²é¿´×ÊѶ¡¢ÓéÀÖÐÅÏ¢ºÍÄú¹ØÐĵįäËûÐÅÏ¢£¡ >
> ><http://www.live.com/getstarted.aspx>Á¢¼´³¢ÊÔ£¡> >--> >You receive this
> message as a subscriber of the > >xmlc@xxxxxxxxxxxxx mailing list.> >To
> unsubscribe: mailto:xmlc-unsubscribe@xxxxxxxxxxxxx> >For general help:
> mailto:sympa@xxxxxxxxxxxxx?subject=help> >ObjectWeb mailing lists service
> home page: http://www.objectweb.org/wws> >
> _________________________________________________________________
> ʹÓÃÏÂÒ»´úµÄ MSN Messenger¡£
>
http://imagine-msn.com/messenger/launch80/default.aspx?locale=zh-cn&source=wlmailtagline





--
You receive this message as a subscriber of the xmlc@xxxxxxxxxxxxx mailing list.
To unsubscribe: mailto:xmlc-unsubscribe@xxxxxxxxxxxxx
For general help: mailto:sympa@xxxxxxxxxxxxx?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise