logo       

Re: Thought on future of XMLC: msg#00123

java.enhydra.xmlc

Subject: Re: Thought on future of XMLC

On Friday 29 November 2002 11:38, Arno Schatz wrote:
> David,
>
> you would need to modify the DOM implementation such that it will be aware
> if there was any change which the application does to the tree. Also you
> would need more information from the parsing process: When the DOM is
> created you would need to store the beginning offset and ending offset
> (from the original HTML string) within the DOM node. The original HTML
> string must be stored in memory of course (little overhead). In the output
> process the for each DOM node we would need to look if it was changed in
> some way be the application. If yes, produce the html from the node output
> as it is done now. If no, take the substriing from the original html from
> the beginning offset to the ending offset and return that as result.

That's basically what the LazyDOM does, with one smalll but important
differences: The LazyDOM doesn't store the *orginal* HTML, but rather caches
the HTML that is constructed by the "standard" output process. The reason for
this is simply that it is much (orders of magnitude :-) easier to create HTML
text from a DOM than to create a DOM-like structure from broken HTML-like
text.

The other difference is that the LazyDOM caches preformatted texts per DOM
node - so, you still have to walk the tree and output each node. But the
treewalk really isn't that much of a performance hit - the big hit is the
text conversion (especially detecing characters that need to be converted to
HTML entities). That said, changing the text cache so that a complete,
unchanged subtree can be output in a single operation is something I've
wanted to do for a while now, and I'll probably implement it along the way
when XMLC is changed to no longer depend on a specific version of Xerces - so
expect this for XMLC 3.something :-)

> If you look at the changes we really make, (even if we consider URL
> mapping) mostly we are changing some leaves. (copying template rows is not
> really a changing operation on the node, as you still can use the original
> html for out putting, because the original html is of course immutable)
>
> So we would have
> 1) the size of the html as memory overhead.
> 2) need to change the parsing process to keep track of the offsets
> 3) need to have a DOM implementation which has a modified flag and a
> beginning and ending offset (probably integer)

LazyDOM already does that, minus the offset stuff.

> And we get
> 1) quite some speed in spitting out html (over the current process)

A bit, but not too much faster than the LazyDOM is my guess.

> 2) large parts of the output html will be exactly what the input (the
> original html) was.

But you spend a huge amount of time on parsing "HTML-like" stuff and forcing
it into something that resembles a DOM - that's a can of worms I definitely
don't want to open.


--
Richard Kunze

[ t]ivano Software, Bahnhofstr. 18, 63263 Neu-Isenburg
Tel.: +49 6102 80 99 07 - 0, Fax.: +49 6102 80 99 07 - 1
http://www.tivano.de, kunze@xxxxxxxxx


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise