logo       

RE: best approach to whole document cloning in Xerces2?: msg#00016

Subject: RE: best approach to whole document cloning in Xerces2?
I ran extensive tests to see if clone would be faster (assumed it would,
at first).  I found that reparsing the original file (assuming it hadn't
changed) was significantly faster than clone.  If you have to serialize
first, you might lose that advantage but I seem to recall it was
significant.

-----Original Message-----
From: Michael Glavassevich [mailto:mrglavas@xxxxxxxxxx] 
Sent: Monday, April 17, 2006 7:40 AM
To: general@xxxxxxxxxxxxxx
Cc: j-users@xxxxxxxxxxxxxxxxx
Subject: Re: best approach to whole document cloning in Xerces2?

Jacob Kjome <hoju@xxxxxxxx> wrote on 04/17/2006 09:17:53 AM:

> At 11:17 AM 4/16/2006, you wrote:
>  >Hi Jake,
>  >
>  >The behaviour of Document.cloneNode(true) [1] is implementation 
dependent.
>  >In Xerces it will create a new Document and then import the children

from
>  >the original document.
> 
> Which would leave out the DTD, I suppose.

I believe it does copy DocumentType nodes, though there's no guarantee 
that other DOM implementations will do that.

> So, it would make more 
> sense to create my own document and do something like this, right?...
> 
>              DOMImplementation domImpl = document.getImplementation();
>              String documentElement = document.getDoctype().getName();
>              DocumentType docType = 
> domImpl.createDocumentType(documentElement, 
> document.getDoctype().getPublicId(), 
document.getDoctype().getSystemId());
>              Document doc = domImpl.createDocument("", 
> documentElement, docType);
>              Node node = doc.importNode(document.getDocumentElement(),

true);
>              doc.replaceChild(node, doc.getDocumentElement());
> 
> This is what I do currently to get a copy of the template DOM at 
> runtime, but I just want to make sure I'm doing it the most correct 
> and efficient way possible.
> 
> Of course, this leaves out any internal subset and entity nodes, 
> no?

Right.

> How would I clone it all?

The implementation of Xerces' Document.cloneNode() should be able to do 
that.

> Is it possible via the DOM interfaces?

You cannot import DocumentType nodes using the DOM API (
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Core-D
ocument-importNode).

>  > I would be really surprised if reparsing the
>  >document performed better than an in-memory copy (unless you had a
>  >UserDataHandler [2] registered which does some heavy operation in 
response
>  >to the cloning/importing).
>  >
> 
> I kind of figured this, but I just wanted to make sure that the 
> caching of template DOM's that I'm doing makes sense.
> 
> Jake
> 
>  >[1]
>  >http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.
> html#ID-3A0ED0A4
>  >[2]
>
>http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#User
>  >DataHandler
>  >
>  >Michael Glavassevich
>  >XML Parser Development
>  >IBM Toronto Lab
>  >E-mail: mrglavas@xxxxxxxxxx
>  >E-mail: mrglavas@xxxxxxxxxx
>  >
>  >Jacob Kjome <hoju@xxxxxxxx> wrote on 04/16/2006 02:17:10 AM:
>  >
>  >>
>  >> I'm wondering what's the best approach to cloning an entire
>  >> document?  Would it be better to keep a master copy in memory and
>  >> then create a new document and import the other document in there,

or
>  >> would it be better to simply reparse the document every time
(where
>  >> the document is used over and over again as a template, a copy is
>  >> created and manipulated on each HTTP request, then serialized to
the
>  >> browser)?  If I keep the document in memory and know I am dealing
>  >> with the Xerces2 implementation, can I just call cloneNode(true)
and
>  >> get an identical copy of the whole document, including doctype,
>  >> entities, entity references, etc...?  Again, would this be more
>  >> efficient than reparsing the document each time with, say, the
>  >> Xerces2 DOMParser?  Is there a clear-cut answer to this, or does
it
>  >> depend on document size or other aspect of the document or 
environment?
>  >>
>  >> thanks,
>  >>
>  >> Jake
>  >>
>  >>
>  >> 
---------------------------------------------------------------------
>  >> To unsubscribe, e-mail: general-unsubscribe@xxxxxxxxxxxxxx
>  >> For additional commands, e-mail: general-help@xxxxxxxxxxxxxx
>  >
>  >
>
>---------------------------------------------------------------------
>  >To unsubscribe, e-mail: general-unsubscribe@xxxxxxxxxxxxxx
>  >For additional commands, e-mail: general-help@xxxxxxxxxxxxxx
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@xxxxxxxxxxxxxx
> For additional commands, e-mail: general-help@xxxxxxxxxxxxxx

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@xxxxxxxxxx
E-mail: mrglavas@xxxxxxxxxx

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: j-users-help@xxxxxxxxxxxxxxxxx


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
boot-loaders.gr...    php.pear.genera...    debugging.valgr...    kde.redhat.user...    text.xml.xsl.ge...    culture.languag...    hardware.microc...    java.servicemix...    redhat.release....    web.zope.plone....    user-groups.lin...    opendarwin.webk...    video.mjpeg.use...    sysutils.bcfg2....    encryption.gpg....    lx-office.devel...    xfree86.forum/2...    mail.mutt.devel...    acpi.devel/2003...    qnx.openqnx.dev...    network.irc.irs...    freebsd.devel.m...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe