|
|
Re: [xmlc] Parser questions: msg#00001
java.enhydra.xmlc
|
Subject: |
Re: [xmlc] Parser questions |
At 05:42 AM 8/11/2006, you wrote:
Jake,
I haven't had a chance yet to investigate the Deferred Parsing (or
defer-node-expansion) in any detail yet although our LazyDOM
extension seems to be a bit different from what you described. Our
solution makes no attempt to re-parse a document modified on the
file system but is more about providing a basic in memory document
sub-classing system. The base document is the original document
defined on the file system and loaded as a normal LazyDOM that is to
generate locale/browser (or any criteria) variations at run time
upon demand, which are then optimised as a LazyDOM and cached. The
locale variants are then themselves used to generate dynamic data
variations per document request. The general idea is to build up a
cached hierarchy of commonly used document variations and to reuse
the LazyDOM benefit of avoiding a full document deep clone when
generating a document at any level of the hierarchy.
Hmm... Barracuda's localization support, which utilizes XMLC, sounds
a bit like your locale specific templates. In Barracuda's case, you
start with a single document and let an Ant task generate localized
versions of that file based on a localization resource bundle. This
creates a separate copy of the document (and a corresponding XMLC
impl class) for each locale. Some argue that this is excessive and
that a single document could be localized at runtime. Then again,
the localizations are generated, not created manually, so it doesn't
require any extra maintenance, and there is no runtime overhead for
localization.
Deferred Parsing fits into the picture for performing caching and
loading of the document for use by the XMLC wrapper class. The DOM
structure is cloned for each request so as not to damage the original
cached DOM structure. LazyDOM cloning should be a pretty fast
operation, as far as I understand it. You sound like you have more
firsthand knowledge of lazyDOM, so you can correct me if I'm
wrong. Deferred Parsing also provides a mechanism where you can set
up resource directories where it can monitor the source documents and
verify whether they have changed since the document was cached as a
DOM structure. If it finds a change, it will reparse the document,
cache it again, and provide a cloned copy again for each request. Of
course, documents can also be pulled from the classpath, in which
case the reparsing feature doesn't really come into play. There's
also a feature where a document can be loaded at runtime without
having compiled an XMLC class. An XMLC class is dynamically
generated at runtime for you using bytecode generated from ASM or
BCEL. This feature is called dynamic loading and it not overly
mature. It could be made better by using dynamic proxies to bind it
to an interface, allowing normal XMLC class use and behavior.
The only thing that doesn't sound like what the deferred parsing
feature provides is the fact that you are taking a delta of the base
document. I'm not quite sure I understand what you are saying,
though? If you are caching the document, you have to obtain a
separate copy of the DOM for manipulation, otherwise you pollute the
cached DOM structure. How exactly does the delta work? I could
imagine your concept taking the place of the cloning that deferred
parsing does, especially if it is applicable to more than just the
lazydom. I'd need to better understand it to really evaluate this, though.
I encourage you grab the CVS HEAD and check out the deferredparsing
package. Look specifically at DocumentLoaderImpl.java and let me
know if you see any ways it could be improved. Try the Tomcat
example to see it all in action.
Jake
BaseDocument (loaded from file system and cached as LazyDOM)
|_BaseDocument[locale:es, browser:x] (dynamically
generated, optimised and cached as LazyDOM)
| |_DynamicDocument (dynamicaly generated, delta
only and not cached)
| |_DynamicDocument (dynamicaly generated, delta
only and not cached)
| :
|_BaseDocument[locale:es, browser:y] (dynamically generated
and cached as LazyDOM)
| |_DynamicDocument (dynamicaly generated, delta
only and not cached)
| |_DynamicDocument (dynamicaly generated, delta
only and not cached)
| :
|_BaseDocument[locale:fr, browser:x] (dynamically
generated, optimised and cached as LazyDOM)
| |_DynamicDocument (dynamicaly generated, delta
only and not cached)
| |_DynamicDocument (dynamicaly generated, delta
only and not cached)
| |_DynamicDocument (dynamicaly generated, delta
only and not cached)
| :
|_BaseDocument[locale:fr, browser:y] (dynamically
generated, optimised and cached as LazyDOM)
| |_DynamicDocument (dynamicaly generated, delta
only and not cached)
| |_DynamicDocument (dynamicaly generated, delta
only and not cached)
| |_DynamicDocument (dynamicaly generated, delta
only and not cached)
| :
|_BaseDocument[locale:en_GB, browser:x] (dynamically
generated, optimised and cached as LazyDOM)
| |_DynamicDocument (dynamicaly generated, delta
only and not cached)
| |_DynamicDocument (dynamicaly generated, delta
only and not cached)
| :
|_BaseDocument[locale:en_GB, browser:y] (dynamically
generated, optimised and cached as LazyDOM)
|_DynamicDocument (dynamicaly generated, delta
only and not cached)
|_DynamicDocument (dynamicaly generated, delta
only and not cached)
:
I hope this helps to explain the approach.
Jacob Kjome wrote:
Hi Chris,
At 03:54 AM 7/9/2006, you wrote:
To add to David's description, the LazyDOM stores the original
document as a tree of nodes that have already been resolved to
text using the desired encoding. When a modified LazyDOM instance
is serialized the original stored document is fully traversed
with modified nodes being picked up from the modified instance as
David explained. The performance saving comes from the fact that
the unmodified nodes from the original document tree only need to
be concatenated together and not transformed to text from the
internal node object type. As David said if your page is extremely
dynamic where most of the nodes have been touched than the
majority of the nodes will have to be transformed into text anyway
so the lazyDOM will not provide much of an advantage if any.
However if your modifying a small percentage of a documents and
the document itself is quite large then the performance gain could
be significant.
Thanks for the clear explanation!
We still use the LazyDOM for generating dynamic XHTML and VoiceXML
documents and are very happy with the performance gain. We've also
extended the LazyDOM architecture to support the notion of static
and dynamic LazyDOMs. Static refers to the class currently
generated by the XMLC compiler. Dynamic LazyDOMs can be created at
runtime by producing another LazyDOM in memory from a modifying
static LazyDOM instance, which is then usually cached in memory.
This provides the benefit of being able to generate performant
LazyDOM structures at runtime, where all the nodes have been
resolved to text. We use this technique for generating and caching
locale and browser specific LazyDOMs on the fly where there is a
large percentage of nodes modified for each document that we want
to avoid the processing for each user request. If this
functionality is of interest to the core XMLC I would be more than
happy to submit it.
This sounds a bit like what Deferred Parsing already does; it
caches documents for use in memory and re-parses them if they
change at runtime. Otherwise, a deep clone is made of the document
for each request. You get a minor hit on the first request for the
document parsing, but then it is fast as can be after that. It was
originally created because some XMLC classes were too large for
older JVMs to deal with. The re-parsing feature was an added
bonus. Do you use deferred parsing? Are you using current
versions of XMLC? I'm interested in your modifications, but I
suspect that you might be able to get the same (or similar) results
using pre-existing XMLC features. More below...
Jacob, this Xerces2 defer-node-expansion feature sounds
interesting. I wonder if, and how much, it does make the LazyDOM
approach redundant.
I am considering removing LazyDOM when I make the move to Xerces2
and letting users simply specify defer-node-expansion. I'm also
considering using deferred parsing exclusively and removing
statically generated XMLC classes altogether. This would remove a
significant amount of code and reduce the maintenance effort and
size of XMLC. It also gets rid of a number of incompatibilities
with Xerces2 related to DOMs that get generated by using normal DOM
mechanisms to build documents. Instead, we'd just let Xerces2 do
all the work of building the DOM by specifying the property
"document-class-name". However, I just noticed the following note
when using this property..
"When the document class name is set to a value other than the name
of the default document factory, the deferred node expansion
feature does not work."
I wonder if there is a work around for this in the case that the
custom DOMs extend "org.apache.xerces.dom.DocumentImpl", which most
of the XMLC custom DOMs do?
See:
<http://xerces.apache.org/xerces2-j/properties.html>http://xerces.apache.org/xerces2-j/properties.html
http://xerces.apache.org/xerces2-j/features.html
Jake
Chris
David H. Young wrote:
Well, to paraphrase... "I know Mark Diekhans and I'm no Mark
Diekhans..." ;)
But I do remember (especially having re-read p178 of my old xmlc
book) that Mark designed the lazyDOM as a way of avoiding all the
node expanding that happens when a DOM is loaded into memory.
Mark's lazyDOM implemented a parallel DOM structure (as an array
of nodes that have been expanded) that captured the part of the
DOM that was modified by your code. This approach avoided wasting
memory for expanded nodes that are never touched. Since XMLC
keys off of the id attributes that indicate targeted form
elements that will be _directly_ manipulated by a Java
application, XMLC could use that knowledge to support a lazyDOM
strategy. In other words, since you don't have to do a bunch of
traversing to get to that node, no need to expand
parent nodes for that possible traversing.
The only caveat about using the lazyDOM option is that there's
probably no advantage to using it if your page is extremely
dynamic (i.e., you're gotta touch a lot of nodes) since you
encounter the law of diminishing return (on the value of the
lazyDOM strategy). I'd recommend just compling your two in both
modes and see what kind of performance results (or memory
utilization) you get.
Hope that helps.
David
Jacob Kjome <mailto:hoju@xxxxxxxx><hoju@xxxxxxxx> wrote:
At 12:35 PM 7/8/2006, you wrote:
>I'm slightly confused regarding parsers in XMLC (using 2.2.9
>standalone for XHTML, XML and HTML rendering).
>
>- Why should one use deferred parsing (lazy DOM)? My understanding is
>that deferred parsing can potentially save memory and improve
>performance, but only if the DOM tree is not fully traversed. Isn't a
>DOM tree anyways fully traversed if you serialize it (e.g. using
>XMLSerializer)? If that's the case then lazy DOM doesn't seem to make
>much sense to me for XMLC.
>
Mark Diekhans would be the one to answer this, as he's the inventor
of lazydom. Hopefully he's still watching the list.
>- Why not use standard Xerces-J 2 (e.g. the one shipped with J2SE 5,
>or from xerces.apache.org)? I'm not sure that I understand why it
>can't and shouldn't be used.
>
XMLC is very hardwired to the XML parser. Because Xerces2 deviated
in a number of non-trivial ways from Xerces1, XMLC code that depends
on Xerces1 has to be migrated to Xerces2. The good news is that I
have a version of XMLC in my local sandbox that works directly with
Xerces2 (specifically, version 2.8.0+ as well as unreleased Apache
xmlcommons Resolver code, which will, hopefully, be release with
Xerces-2.9.0). It's always hard to find the time get this fully
implemented, which is why it hasn't been made publicly
available. There are a few things to work out. I also have to look
into whether XMLC's lazydom makes sense with the existence of the
defer-node-expansion feature already exists in Xerces2.
I've already branched the XMLC-2.2 stuff off and HEAD development
will be for a 2.3 release which is compatible with Xerces2, getting
rid of all vestiges of Xerces1.
>Any comments or answers greatly appreciated.
I'd appreciate any of your thoughts on issues surrounding Xerces2,
such as whether I should continue to distribute Xerces wrapped in the
org.enhydra.apache namespace to avoid clashes with older versions of
Xerces2 or just use the org.apache.xerces directly, resulting in a
much smaller XMLC jar.
Jake
>
>Chris
>
>
>
>--
>You receive this message as a subscriber of the
<mailto:xmlc@xxxxxxxxxxxxx>xmlc@xxxxxxxxxxxxx
>mailing list.
>To unsubscribe:
<mailto:xmlc-unsubscribe@xxxxxxxxxxxxx>mailto:xmlc-unsubscribe@xxxxxxxxxxxxx
>For general help:
<mailto:sympa@xxxxxxxxxxxxx?subject=help>mailto:sympa@xxxxxxxxxxxxx?subject=help
>ObjectWeb mailing lists service home page:
<http://www.objectweb.org/wws>http://www.objectweb.org/wws
--
You receive this message as a subscriber of the
<mailto:xmlc@xxxxxxxxxxxxx>xmlc@xxxxxxxxxxxxx mailing list.
To unsubscribe:
<mailto:xmlc-unsubscribe@xxxxxxxxxxxxx>mailto:xmlc-unsubscribe@xxxxxxxxxxxxx
For general help:
<mailto:sympa@xxxxxxxxxxxxx?subject=help>mailto:sympa@xxxxxxxxxxxxx?subject=help
ObjectWeb mailing lists service home page:
<http://www.objectweb.org/wws>http://www.objectweb.org/wws
David H. Young
Albuquerque, New Mexico
<http://www.kspar.net>http://www.kspar.net
<mailto:david@xxxxxxxxx>david@xxxxxxxxx
--
You receive this message as a subscriber of the
<mailto:xmlc@xxxxxxxxxxxxx>xmlc@xxxxxxxxxxxxx mailing list.
To unsubscribe:
<mailto:xmlc-unsubscribe@xxxxxxxxxxxxx>mailto:xmlc-unsubscribe@xxxxxxxxxxxxx
For general help:
<mailto:sympa@xxxxxxxxxxxxx?subject=help>mailto:sympa@xxxxxxxxxxxxx?subject=help
ObjectWeb mailing lists service home page:
<http://www.objectweb.org/wws>http://www.objectweb.org/wws
--
You receive this message as a subscriber of the
<mailto:xmlc@xxxxxxxxxxxxx>xmlc@xxxxxxxxxxxxx mailing list.
To unsubscribe:
<mailto:xmlc-unsubscribe@xxxxxxxxxxxxx>mailto:xmlc-unsubscribe@xxxxxxxxxxxxx
For general help:
<mailto:sympa@xxxxxxxxxxxxx?subject=help>mailto:sympa@xxxxxxxxxxxxx?subject=help
ObjectWeb mailing lists service home page:
<http://www.objectweb.org/wws>http://www.objectweb.org/wws
--
You receive this message as a subscriber of the
<mailto:xmlc@xxxxxxxxxxxxx>xmlc@xxxxxxxxxxxxx mailing list.
To unsubscribe:
<mailto:xmlc-unsubscribe@xxxxxxxxxxxxx>mailto:xmlc-unsubscribe@xxxxxxxxxxxxx
For general help:
<mailto:sympa@xxxxxxxxxxxxx?subject=help>mailto:sympa@xxxxxxxxxxxxx?subject=help
ObjectWeb mailing lists service home page:
<http://www.objectweb.org/wws>http://www.objectweb.org/wws
--
You receive this message as a subscriber of the xmlc@xxxxxxxxxxxxx
mailing list.
To unsubscribe: mailto:xmlc-unsubscribe@xxxxxxxxxxxxx
For general help: mailto:sympa@xxxxxxxxxxxxx?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
--
You receive this message as a subscriber of the xmlc@xxxxxxxxxxxxx mailing list.
To unsubscribe: mailto:xmlc-unsubscribe@xxxxxxxxxxxxx
For general help: mailto:sympa@xxxxxxxxxxxxx?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
| |