logo       

Re: [xmlc] Parser questions: msg#00001

java.enhydra.xmlc

Subject: Re: [xmlc] Parser questions

At 05:42 AM 8/11/2006, you wrote:
Jake,

I haven't had a chance yet to investigate the Deferred Parsing (or defer-node-expansion) in any detail yet although our LazyDOM extension seems to be a bit different from what you described. Our solution makes no attempt to re-parse a document modified on the file system but is more about providing a basic in memory document sub-classing system. The base document is the original document defined on the file system and loaded as a normal LazyDOM that is to generate locale/browser (or any criteria) variations at run time upon demand, which are then optimised as a LazyDOM and cached. The locale variants are then themselves used to generate dynamic data variations per document request. The general idea is to build up a cached hierarchy of commonly used document variations and to reuse the LazyDOM benefit of avoiding a full document deep clone when generating a document at any level of the hierarchy.

Hmm... Barracuda's localization support, which utilizes XMLC, sounds a bit like your locale specific templates. In Barracuda's case, you start with a single document and let an Ant task generate localized versions of that file based on a localization resource bundle. This creates a separate copy of the document (and a corresponding XMLC impl class) for each locale. Some argue that this is excessive and that a single document could be localized at runtime. Then again, the localizations are generated, not created manually, so it doesn't require any extra maintenance, and there is no runtime overhead for localization.

Deferred Parsing fits into the picture for performing caching and loading of the document for use by the XMLC wrapper class. The DOM structure is cloned for each request so as not to damage the original cached DOM structure. LazyDOM cloning should be a pretty fast operation, as far as I understand it. You sound like you have more firsthand knowledge of lazyDOM, so you can correct me if I'm wrong. Deferred Parsing also provides a mechanism where you can set up resource directories where it can monitor the source documents and verify whether they have changed since the document was cached as a DOM structure. If it finds a change, it will reparse the document, cache it again, and provide a cloned copy again for each request. Of course, documents can also be pulled from the classpath, in which case the reparsing feature doesn't really come into play. There's also a feature where a document can be loaded at runtime without having compiled an XMLC class. An XMLC class is dynamically generated at runtime for you using bytecode generated from ASM or BCEL. This feature is called dynamic loading and it not overly mature. It could be made better by using dynamic proxies to bind it to an interface, allowing normal XMLC class use and behavior.

The only thing that doesn't sound like what the deferred parsing feature provides is the fact that you are taking a delta of the base document. I'm not quite sure I understand what you are saying, though? If you are caching the document, you have to obtain a separate copy of the DOM for manipulation, otherwise you pollute the cached DOM structure. How exactly does the delta work? I could imagine your concept taking the place of the cloning that deferred parsing does, especially if it is applicable to more than just the lazydom. I'd need to better understand it to really evaluate this, though.

I encourage you grab the CVS HEAD and check out the deferredparsing package. Look specifically at DocumentLoaderImpl.java and let me know if you see any ways it could be improved. Try the Tomcat example to see it all in action.


Jake


BaseDocument (loaded from file system and cached as LazyDOM)
|_BaseDocument[locale:es, browser:x] (dynamically generated, optimised and cached as LazyDOM)
| |_DynamicDocument (dynamicaly generated, delta only and not cached)
| |_DynamicDocument (dynamicaly generated, delta only and not cached)
| :
|_BaseDocument[locale:es, browser:y] (dynamically generated and cached as LazyDOM)
| |_DynamicDocument (dynamicaly generated, delta only and not cached)
| |_DynamicDocument (dynamicaly generated, delta only and not cached)
| :
|_BaseDocument[locale:fr, browser:x] (dynamically generated, optimised and cached as LazyDOM)
| |_DynamicDocument (dynamicaly generated, delta only and not cached)
| |_DynamicDocument (dynamicaly generated, delta only and not cached)
| |_DynamicDocument (dynamicaly generated, delta only and not cached)
| :
|_BaseDocument[locale:fr, browser:y] (dynamically generated, optimised and cached as LazyDOM)
| |_DynamicDocument (dynamicaly generated, delta only and not cached)
| |_DynamicDocument (dynamicaly generated, delta only and not cached)
| |_DynamicDocument (dynamicaly generated, delta only and not cached)
| :
|_BaseDocument[locale:en_GB, browser:x] (dynamically generated, optimised and cached as LazyDOM)
| |_DynamicDocument (dynamicaly generated, delta only and not cached)
| |_DynamicDocument (dynamicaly generated, delta only and not cached)
| :
|_BaseDocument[locale:en_GB, browser:y] (dynamically generated, optimised and cached as LazyDOM)
|_DynamicDocument (dynamicaly generated, delta only and not cached)
|_DynamicDocument (dynamicaly generated, delta only and not cached)
:

I hope this helps to explain the approach.

Jacob Kjome wrote:

Hi Chris,

At 03:54 AM 7/9/2006, you wrote:
To add to David's description, the LazyDOM stores the original document as a tree of nodes that have already been resolved to text using the desired encoding. When a modified LazyDOM instance is serialized the original stored document is fully traversed with modified nodes being picked up from the modified instance as David explained. The performance saving comes from the fact that the unmodified nodes from the original document tree only need to be concatenated together and not transformed to text from the internal node object type. As David said if your page is extremely dynamic where most of the nodes have been touched than the majority of the nodes will have to be transformed into text anyway so the lazyDOM will not provide much of an advantage if any. However if your modifying a small percentage of a documents and the document itself is quite large then the performance gain could be significant.

Thanks for the clear explanation!

We still use the LazyDOM for generating dynamic XHTML and VoiceXML documents and are very happy with the performance gain. We've also extended the LazyDOM architecture to support the notion of static and dynamic LazyDOMs. Static refers to the class currently generated by the XMLC compiler. Dynamic LazyDOMs can be created at runtime by producing another LazyDOM in memory from a modifying static LazyDOM instance, which is then usually cached in memory. This provides the benefit of being able to generate performant LazyDOM structures at runtime, where all the nodes have been resolved to text. We use this technique for generating and caching locale and browser specific LazyDOMs on the fly where there is a large percentage of nodes modified for each document that we want to avoid the processing for each user request. If this functionality is of interest to the core XMLC I would be more than happy to submit it.

This sounds a bit like what Deferred Parsing already does; it caches documents for use in memory and re-parses them if they change at runtime. Otherwise, a deep clone is made of the document for each request. You get a minor hit on the first request for the document parsing, but then it is fast as can be after that. It was originally created because some XMLC classes were too large for older JVMs to deal with. The re-parsing feature was an added bonus. Do you use deferred parsing? Are you using current versions of XMLC? I'm interested in your modifications, but I suspect that you might be able to get the same (or similar) results using pre-existing XMLC features. More below...

Jacob, this Xerces2 defer-node-expansion feature sounds interesting. I wonder if, and how much, it does make the LazyDOM approach redundant.

I am considering removing LazyDOM when I make the move to Xerces2 and letting users simply specify defer-node-expansion. I'm also considering using deferred parsing exclusively and removing statically generated XMLC classes altogether. This would remove a significant amount of code and reduce the maintenance effort and size of XMLC. It also gets rid of a number of incompatibilities with Xerces2 related to DOMs that get generated by using normal DOM mechanisms to build documents. Instead, we'd just let Xerces2 do all the work of building the DOM by specifying the property "document-class-name". However, I just noticed the following note when using this property..

"When the document class name is set to a value other than the name of the default document factory, the deferred node expansion feature does not work."

I wonder if there is a work around for this in the case that the custom DOMs extend "org.apache.xerces.dom.DocumentImpl", which most of the XMLC custom DOMs do?

See:
<http://xerces.apache.org/xerces2-j/properties.html>http://xerces.apache.org/xerces2-j/properties.html
http://xerces.apache.org/xerces2-j/features.html


Jake

Chris


David H. Young wrote:
Well, to paraphrase... "I know Mark Diekhans and I'm no Mark Diekhans..." ;)

But I do remember (especially having re-read p178 of my old xmlc book) that Mark designed the lazyDOM as a way of avoiding all the node expanding that happens when a DOM is loaded into memory.

Mark's lazyDOM implemented a parallel DOM structure (as an array of nodes that have been expanded) that captured the part of the DOM that was modified by your code. This approach avoided wasting memory for expanded nodes that are never touched. Since XMLC keys off of the id attributes that indicate targeted form elements that will be _directly_ manipulated by a Java application, XMLC could use that knowledge to support a lazyDOM strategy. In other words, since you don't have to do a bunch of traversing to get to that node, no need to expand
parent nodes for that possible traversing.

The only caveat about using the lazyDOM option is that there's probably no advantage to using it if your page is extremely dynamic (i.e., you're gotta touch a lot of nodes) since you encounter the law of diminishing return (on the value of the lazyDOM strategy). I'd recommend just compling your two in both modes and see what kind of performance results (or memory utilization) you get.

Hope that helps.
David


Jacob Kjome <mailto:hoju@xxxxxxxx><hoju@xxxxxxxx> wrote:
At 12:35 PM 7/8/2006, you wrote:
>I'm slightly confused regarding parsers in XMLC (using 2.2.9
>standalone for XHTML, XML and HTML rendering).
>
>- Why should one use deferred parsing (lazy DOM)? My understanding is
>that deferred parsing can potentially save memory and improve
>performance, but only if the DOM tree is not fully traversed. Isn't a
>DOM tree anyways fully traversed if you serialize it (e.g. using
>XMLSerializer)? If that's the case then lazy DOM doesn't seem to make
>much sense to me for XMLC.
>

Mark Diekhans would be the one to answer this, as he's the inventor
of lazydom. Hopefully he's still watching the list.

>- Why not use standard Xerces-J 2 (e.g. the one shipped with J2SE 5,
>or from xerces.apache.org)? I'm not sure that I understand why it
>can't and shouldn't be used.
>

XMLC is very hardwired to the XML parser. Because Xerces2 deviated
in a number of non-trivial ways from Xerces1, XMLC code that depends
on Xerces1 has to be migrated to Xerces2. The good news is that I
have a version of XMLC in my local sandbox that works directly with
Xerces2 (specifically, version 2.8.0+ as well as unreleased Apache
xmlcommons Resolver code, which will, hopefully, be release with
Xerces-2.9.0). It's always hard to find the time get this fully
implemented, which is why it hasn't been made publicly
available. There are a few things to work out. I also have to look
into whether XMLC's lazydom makes sense with the existence of the
defer-node-expansion feature already exists in Xerces2.

I've already branched the XMLC-2.2 stuff off and HEAD development
will be for a 2.3 release which is compatible with Xerces2, getting
rid of all vestiges of Xerces1.

>Any comments or answers greatly appreciated.

I'd appreciate any of your thoughts on issues surrounding Xerces2,
such as whether I should continue to distribute Xerces wrapped in the
org.enhydra.apache namespace to avoid clashes with older versions of
Xerces2 or just use the org.apache.xerces directly, resulting in a
much smaller XMLC jar.

Jake

>
>Chris
>
>
>
>--
>You receive this message as a subscriber of the <mailto:xmlc@xxxxxxxxxxxxx>xmlc@xxxxxxxxxxxxx
>mailing list.
>To unsubscribe: <mailto:xmlc-unsubscribe@xxxxxxxxxxxxx>mailto:xmlc-unsubscribe@xxxxxxxxxxxxx
>For general help: <mailto:sympa@xxxxxxxxxxxxx?subject=help>mailto:sympa@xxxxxxxxxxxxx?subject=help
>ObjectWeb mailing lists service home page: <http://www.objectweb.org/wws>http://www.objectweb.org/wws



--
You receive this message as a subscriber of the <mailto:xmlc@xxxxxxxxxxxxx>xmlc@xxxxxxxxxxxxx mailing list.
To unsubscribe: <mailto:xmlc-unsubscribe@xxxxxxxxxxxxx>mailto:xmlc-unsubscribe@xxxxxxxxxxxxx
For general help: <mailto:sympa@xxxxxxxxxxxxx?subject=help>mailto:sympa@xxxxxxxxxxxxx?subject=help
ObjectWeb mailing lists service home page: <http://www.objectweb.org/wws>http://www.objectweb.org/wws




David H. Young
Albuquerque, New Mexico
<http://www.kspar.net>http://www.kspar.net
<mailto:david@xxxxxxxxx>david@xxxxxxxxx




--
You receive this message as a subscriber of the <mailto:xmlc@xxxxxxxxxxxxx>xmlc@xxxxxxxxxxxxx mailing list.
To unsubscribe: <mailto:xmlc-unsubscribe@xxxxxxxxxxxxx>mailto:xmlc-unsubscribe@xxxxxxxxxxxxx
For general help: <mailto:sympa@xxxxxxxxxxxxx?subject=help>mailto:sympa@xxxxxxxxxxxxx?subject=help
ObjectWeb mailing lists service home page: <http://www.objectweb.org/wws>http://www.objectweb.org/wws


--
You receive this message as a subscriber of the <mailto:xmlc@xxxxxxxxxxxxx>xmlc@xxxxxxxxxxxxx mailing list.
To unsubscribe: <mailto:xmlc-unsubscribe@xxxxxxxxxxxxx>mailto:xmlc-unsubscribe@xxxxxxxxxxxxx
For general help: <mailto:sympa@xxxxxxxxxxxxx?subject=help>mailto:sympa@xxxxxxxxxxxxx?subject=help
ObjectWeb mailing lists service home page: <http://www.objectweb.org/wws>http://www.objectweb.org/wws








--
You receive this message as a subscriber of the <mailto:xmlc@xxxxxxxxxxxxx>xmlc@xxxxxxxxxxxxx mailing list.
To unsubscribe: <mailto:xmlc-unsubscribe@xxxxxxxxxxxxx>mailto:xmlc-unsubscribe@xxxxxxxxxxxxx
For general help: <mailto:sympa@xxxxxxxxxxxxx?subject=help>mailto:sympa@xxxxxxxxxxxxx?subject=help
ObjectWeb mailing lists service home page: <http://www.objectweb.org/wws>http://www.objectweb.org/wws



--
You receive this message as a subscriber of the xmlc@xxxxxxxxxxxxx mailing list.
To unsubscribe: mailto:xmlc-unsubscribe@xxxxxxxxxxxxx
For general help: mailto:sympa@xxxxxxxxxxxxx?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws



--
You receive this message as a subscriber of the xmlc@xxxxxxxxxxxxx mailing list.
To unsubscribe: mailto:xmlc-unsubscribe@xxxxxxxxxxxxx
For general help: mailto:sympa@xxxxxxxxxxxxx?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise