logo       

Re: Thought on future of XMLC: msg#00122

java.enhydra.xmlc

Subject: Re: Thought on future of XMLC

David,

you would need to modify the DOM implementation such that it will be aware if there was any change which the application does to the tree. Also you would need more information from the parsing process: When the DOM is created you would need to store the beginning offset and ending offset (from the original HTML string) within the DOM node. The original HTML string must be stored in memory of course (little overhead). In the output process the for each DOM node we would need to look if it was changed in some way be the application. If yes, produce the html from the node output as it is done now. If no, take the substriing from the original html from the beginning offset to the ending offset and return that as result.

If you look at the changes we really make, (even if we consider URL mapping) mostly we are changing some leaves. (copying template rows is not really a changing operation on the node, as you still can use the original html for out putting, because the original html is of course immutable)

So we would have
1) the size of the html as memory overhead.
2) need to change the parsing process to keep track of the offsets
3) need to have a DOM implementation which has a modified flag and a beginning and ending offset (probably integer)

And we get
1) quite some speed in spitting out html (over the current process)
2) large parts of the output html will be exactly what the input (the original
html) was.

-Arno


David Li wrote:
Arno,

The problem with the approach you are proposing here is that it's impossible to predict which part of the HTML pages will be modified and which part won't. It may be possible for a small projects that only uses simple get/set methods. A lot of XMLC programming is done with DOM API which can potentially traverse the entire page.

An alternative is possible with LazyDOM. DOM is a tree structure. At each node, we can keep a serialized string of the node as how it and its subtree would look after being serialized. As LazyDOM keep track of which node is modified, we can assume that its copy of serialized string is invalid and traverse the subtree to generated the new serialized string. However, this would cause a large increase in the memory usage approximately O(filesize * height of the DOM tree * 2). For a 50 K page with 10 level depth, it comes out to be 2M of memory (ascii goes unicode in Java). Some smart pruning of tree is necessary to reduce the memory foot print to make it become feasible solution.

David Li
---
"It spells Mac OS X but pronounces NeXTSTEP"

On Friday, Nov 29, 2002, at 05:10 Asia/Shanghai, Arno Schatz wrote:

Hi Jake,

sorry to not explain properly, I guess some other did understand me only because I was mentioning this somewhere else before.

When the DOM tree is created, there are a lot of nodes which will not be changed by the programm. (Mostly a application program only changes nodes which have an id attribute) So there are whole subtrees of the created DOM tree, which will never be changed by the application. This subtree is created from an html-string (a substring of the original html-page). So if you want to output such an unchanged subtree, you could output the original string from the html file. For generating the output from the DOM, xmlc runs through the hole tree, even through these unchanged nodes and generates the html. If it had a refernce to the original html, it could output the part of the orioginal html it was created from.

The current time consumption for outputting html is quite high as you might know. But there are other ways to speed up as well. So the thing in question between me and mark is whether it is better to use the original html string or the html produced by the DOM.

Is that understandable?

Arno


_______________________________________________
XMLC mailing list
XMLC@xxxxxxxxxxx
http://www.enhydra.org/mailman/listinfo.cgi/xmlc



<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise