logo       
Google Custom Search
    AddThis Social Bookmark Button
-->

[jira] Commented: (XERCESJ-970) Large comments are extremely slow to parse: msg#00041

Subject: [jira] Commented: (XERCESJ-970) Large comments are extremely slow to parse
The following comment has been added to this issue:

     Author: Sean Griffin
    Created: Tue, 15 Jun 2004 2:48 PM
       Body:
You're right, the second time parsing was much faster.  I ran the parsing 
through a profiler and noticed that the problem is localized to the 
XMLEntityScanner.scanData(String, XMLStringBuffer) method.
---------------------------------------------------------------------
View this comment:
  http://issues.apache.org/jira/browse/XERCESJ-970?page=comments#action_36188

---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/XERCESJ-970

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: XERCESJ-970
    Summary: Large comments are extremely slow to parse
       Type: Bug

     Status: Unassigned
   Priority: Minor

    Project: Xerces2-J
 Components: 
             XNI
   Versions:
             2.2.0
             2.2.1
             2.3.0
             2.4.0
             2.5.0
             2.6.0
             2.6.1
             2.6.2

   Assignee: 
   Reporter: Sean Griffin

    Created: Fri, 28 May 2004 9:48 AM
    Updated: Tue, 15 Jun 2004 2:48 PM
Environment: Windows XP running Java 1.4.2

Description:
Very large comments drastically increase the parsing time for both SAX and DOM 
implementations.  Running the sax.Counter and dom.Counter samples with a 410KB 
file where the entire thing is uncommented results in parse times in the 100ms 
to 300ms range.  However, if I comment out 95% of the file and run the same 
samples the parse times jump to between 40 and 50 seconds.  I ran the same 
samples using the Aelfred parser shipped with Saxon 7.9 and, while the file 
with the large comment was slower than without the comment, it jumped by only 
100ms or so.

I briefly compared the code between the two parsers, and they don't look 
significantly different when it comes to handling comments.  The only main 
difference I noticed was around low/high byte character checks.  I suspect it 
is an inefficiency in the XMLStringBuffer class, but I'm not seeing anything.


---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


<Prev in Thread] Current Thread [Next in Thread>