logo       

Re: problem with SAX - repetitive lines returned by characters(): msg#00114

Subject: Re: problem with SAX - repetitive lines returned by characters()
Hi,
I agree with the points you mentioned, but the problem is exactly that
Xerces doesn't behave the way you say (and which I expected to encounter).

characters() method I use is standart SAX characters method (same as you use
in your Fibonacci example)

-------
public void characters(char[] text, int start, int length) {
  if (processing) {
    buffer.append(text,start,length);
  }
}

public void startElement(...) {
  if (localName.equals(codeName)) {
    buffer = new StringBuffer();
    processing=true;
  }
}

public void endElement(...) {
  if (localName.equals(codeName)) {
    processing=false;
    System.out.println(buffer.toString())
    buffer.setLength(0);
  }
}
-------

It's code I'd expect to work. Problem is that
- every char array returned by characters() has start=0 and length covering
full length of returned text. (so it doesn't return any data which aren't
meant to be not accepted)
- it returns what was returned before, with all chars of array marked as
valid (to be accepted)
so what I get for "1\n2\n3\n4\n" is (where real lines are longer, not just
one digit)

"1\n" : start =0, length =2
"1\n" : start =0, length =2
"2\n" : start =0, length =2
"1\n" : start =0, length =2
"2\n" : start =0, length =2
"3\n" : start =0, length =2
"1\n" : start =0, length =2
"2\n" : start =0, length =2
"3\n" : start =0, length =2
"4\n" : start =0, length =2

To me it seems that there may be problem with version mismatch of
InputSourcem or other classes which pass document to SAX itslef.

Michal


> At 3:48 PM +0000 1/16/04, Michal Sankot wrote:
> >I have problem with SAX bit of Xerces. I use SAX to get lines of an
element
> >of specified tag and print them out.
> >I was using older version of Xerces with which it run fine. When I
replaced
> >old xerces.jar with new xercesImpl.jar SAX starts to behave wierd.
> >
> >CDATA element content which is "1\n2\n3\n4\n" is returned by characters()
> >method as
> >"1\n"
> >"1\n"
> >"2\n"
> >
> >"1\n"
> >
> >"2\n"
> >"3\n"
> >"1\n"
> >"2\n"
> >"3\n"
> >"4\n"
> >strange (and frustrating), isn't it ?
> >
>
> I don't think so. Remember SAX parsers are not required to report all
> character data in a single call to characters. They can and do split
> nodes across multiple calls. You need to buffer and accumulate the
> data until you're ready to use it.
>
> You also have one or both of two other problems. The char array
> passed to characters() is not minimal. It normally contains other
> data not related to the current invocation. You need to use the start
> and length arguments to extract the sub-array relevant to the current
> call.
>
> Finally, the array passed to characters may be reused by the parser.
> You should not store it. Any data you need should be copied into some
> other object. See
>
> http://www.cafeconleche.org/books/xmljava/chapters/ch06s07.html
>
> for more discussion of these points.
> -- 


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
boot-loaders.gr...    php.pear.genera...    debugging.valgr...    kde.redhat.user...    text.xml.xsl.ge...    culture.languag...    hardware.microc...    java.servicemix...    redhat.release....    web.zope.plone....    user-groups.lin...    opendarwin.webk...    video.mjpeg.use...    sysutils.bcfg2....    encryption.gpg....    lx-office.devel...    xfree86.forum/2...    mail.mutt.devel...    acpi.devel/2003...    qnx.openqnx.dev...    network.irc.irs...    freebsd.devel.m...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe