logo       

[jira] Assigned: (XERCESJ-1264) Reduce performance penalty for using an EOF: msg#00029

Subject: [jira] Assigned: (XERCESJ-1264) Reduce performance penalty for using an EOFException to signal the end of the document.
     [ 
https://issues.apache.org/jira/browse/XERCESJ-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Glavassevich reassigned XERCESJ-1264:
---------------------------------------------

    Assignee: Michael Glavassevich

> Reduce performance penalty for using an EOFException to signal the end of the 
> document.
> ---------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1264
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1264
>             Project: Xerces2-J
>          Issue Type: Improvement
>          Components: JAXP (javax.xml.parsers)
>    Affects Versions: 2.9.0
>            Reporter: Michael Glavassevich
>            Assignee: Michael Glavassevich
>
> As part of its normal control flow the XMLEntityScanner will throw an 
> EOFException when it reaches the end of the document.  For small documents, 
> this can take up as much as 20-25% of the total execution time in the parser. 
>  Without messing with the current programming model, most of this time can be 
> recovered by caching the exception (which eliminates the very expensive 
> fillInStackTrace() on creation).
> Wolfgang Hoschek's post [1] to the j-dev list on this subject in 2004:
> =====================================================
> I have a server app that parsers millions of smallish documents.
> Performance has been improved at lot by reusing XMLReaders. It's pretty good 
> but could perhaps get better when studying the (perhaps dubious?) hints given 
> by the java -server -Xprof snippet below (JDK 1.5 RC, xerces CVS head, not 
> using the JDK internal xerces which appears to be twice as slow in this case, 
> for whatever reason).
> Accordingly, the theory is that throwing an (artifical) EOFException in 
> XMLEntityScanner.load() at the end of each document consumes some 25% of the 
> total execution time. Probably due too the heavy nature of exceptions and in 
> particular Throwable.fillInStackTrace(). Would it perhaps be possibly (and 
> correct) to avoid raising artificial exceptions for what appears to be normal 
> program control flow (the documents and streams are fine)?
> Here is the trace snippet:
>           Stub + native   Method
>   28.6%     0  +   487    java.lang.Throwable.fillInStackTrace
>   28.6%     0  +   487    Total stub
>    Thread-local ticks:
>    0.1%     1             Blocked (of total)
>    0.1%     2             Class loader
>    0.1%     2             Compilation
>    0.2%     3             Unknown: thread_state
> Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM
>    Thread-local ticks:
> 100.0%     1             Blocked (of total)
> Global summary of 35.44 seconds:
> 100.0%  1718             Received ticks
>    0.7%    12             Received GC ticks
>    9.7%   167             Compilation
>    0.1%     2             Class loader
>    0.2%     3             Unknown code
> real    0m35.715s
> user    0m34.170s
> sys     0m0.190s
> TRACE 300347:
>          java.lang.Throwable.fillInStackTrace(Throwable.java:Unknown  
> line)
>          java.lang.Throwable.<init>(Throwable.java:181)
>          java.lang.Exception.<init>(Exception.java:29)
>          java.io.IOException.<init>(IOException.java:28)
>          java.io.EOFException.<init>(EOFException.java:32)
>          org.apache.xerces.impl.XMLEntityScanner.load(<Unknown  
> Source>:Unknown line)
>          org.apache.xerces.impl.XMLEntityScanner.skipSpaces(<Unknown  
> Source>:Unknown line)
>           
> org.apache.xerces.impl.XMLDocumentScannerImpl$TrailingMiscDispatcher.dis 
> patch(<Unknown Source>:Unknown line)
>           
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(<Unkn 
> own Source>:Unknown line)
>          org.apache.xerces.parsers.DTDConfiguration.parse(<Unknown  
> Source>:Unknown line)
>          org.apache.xerces.parsers.DTDConfiguration.parse(<Unknown  
> Source>:Unknown line)
>          org.apache.xerces.parsers.XMLParser.parse(<Unknown  
> Source>:Unknown line)
>          org.apache.xerces.parsers.AbstractSAXParser.parse(<Unknown  
> Source>:Unknown line)
>          nu.xom.Builder.build(Builder.java:786)
>          nu.xom.Builder.build(Builder.java:569)
>          gov.lbl.dsd.firefish.trash.XMLXomBench.main(XMLXomBench.java:62)
> I guess the relevant block is
> XMLEntityScanner.load(...):
>              ...
>              if (changeEntity) {
>                  fEntityManager.endEntity();
>                  if (fCurrentEntity == null) {
>                      throw new EOFException();
>                  }
>                  // handle the trailing edges
>                  if (fCurrentEntity.position == fCurrentEntity.count) {
>                      load(0, true);
>                  }
>              }
> [1] 
> http://mail-archives.apache.org/mod_mbox/xerces-j-dev/200409.mbox/%3c25BEC610-FD4A-11D8-AA38-000A95BD16CE@xxxxxxx%3e

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
boot-loaders.gr...    php.pear.genera...    debugging.valgr...    kde.redhat.user...    text.xml.xsl.ge...    culture.languag...    hardware.microc...    java.servicemix...    redhat.release....    web.zope.plone....    user-groups.lin...    opendarwin.webk...    video.mjpeg.use...    sysutils.bcfg2....    encryption.gpg....    lx-office.devel...    xfree86.forum/2...    mail.mutt.devel...    acpi.devel/2003...    qnx.openqnx.dev...    network.irc.irs...    freebsd.devel.m...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe