osdir.com
mailing list archive

Subject: Re: SAX and DOM - msg#00034

List: text.xml.xerces-c.user

Date: Prev Next Index Thread: Prev Next Index
You summarized my goal exactly.
Maybe I'll try to write the part of the file to a temporary file and use
the DOM parser for that file. Maybe it is a poor solution, but it is
the only one I can think of right now.

Thanks,
Patrick

On 8/03/2007 17:11, Jesse Pelton wrote:
> I don't think there's a way to do that. The Xerces parser APIs allow
> you to specify InputSources, not other parsers. Perhaps there's a way
> to construct an InputSource from a SAX parser, but if so, it's not
> obvious to me. Even if it's possible, it would be inefficient; the
> InputSource would convert have to convert SAX events into a stream that
> would then be handed to another SAX parser that would then recreate the
> events.
>
> I gather the goal here is to get a DOM representation of an element in a
> document (including its children, of course) without the overhead of
> representing the whole document. It's an interesting problem, and one
> that others have presumably faced. Maybe someone on the list has solved
> it differently.
>
> -----Original Message-----
> From: news [mailto:news@xxxxxxxxxxxxx] On Behalf Of Patrick Rotsaert
> Sent: Thursday, March 08, 2007 10:50 AM
> To: c-users@xxxxxxxxxxxxxxxxx
> Subject: Re: SAX and DOM
>
> Hi Jesse,
>
> I also understood that the DOM parses uses SAX internally. But is it
> possible to 'override' this in some way? What I mean is, once I find the
> element I am interested in (using SAX), can I create a DOM parser that
> uses that very same SAX parser instance (instead of creating it's own)
> and starts parsing at the 'current' SAX element and stops parsing at the
> end of the element?
>
> Thanks,
> Patrick
>
> On 8/03/2007 16:29, Jesse Pelton wrote:
>> I should think so. The DOM parser creates the entire tree this way;
> I'd
>> think you could wait for SAX to present the element you're looking
> for,
>> then use standard DOM create...() functions to build your tree from
>> there. Your parser would need to keep track of its present state (am
> I
>> parsing something that needs to go into the DOM, and if so, where in
> the
>> tree am I?).
>>
>> Of course, DOM is not particularly space-efficient, so a native
>> representation of the data would be better if that's an option.
>>
>> -----Original Message-----
>> From: news [mailto:news@xxxxxxxxxxxxx] On Behalf Of Patrick Rotsaert
>> Sent: Thursday, March 08, 2007 10:20 AM
>> To: c-users@xxxxxxxxxxxxxxxxx
>> Subject: SAX and DOM
>>
>> Hi all,
>>
>> Is it possible to parse a XML file using SAX and create a DOM tree of
> a
>> part of the file?
>>
>> Thanks,
>> Patrick
>>
>>
>
>




Was this page helpful?
Yes No
Thread at a glance:

Previous Message by Date: click to view message preview

Re: Xerces Benchmarks

Hi, > Benchmarks are a very tricky thing. Are those Java parsers all > full-conformant? How the machines on which those parsers were tested > differ from yours? I would not trust a benchmark that is not carefully > designed and controlled. ok, thanks for this statement. I was talking about these benchmarks (JDKs builtin SAX parser). http://www.ximpleware.com/benchmark1.html (the SAX values, Java SDK > Xerces-C can be very sensitive to several factors, including the compiler > used to build the binaries, and the OS memory allocation functions. Since > you don't mention your OS or compiler, it's hard to say if there's anything > you can do to get better results. I'm running Ubuntu Linux 6.06 on a 2.0GHz Dua Core with 1GB RAM available. I compiled the xerces libraries from source, using gcc/g++ and optimization level -03. I was just wondering about this performance difference as my processor is even better as the one used in the experiments above and they used a Java system instead of C++. If you have any hints on how to speed up my system I would be really interested in... Michael _________________________________________________________________________ In 5 Schritten zur eigenen Homepage. Jetzt Domain sichern und gestalten! Nur 3,99 EUR/Monat! http://www.maildomain.web.de/?mc=021114

Next Message by Date: click to view message preview

Re: Xerces Benchmarks

Hi Michael, Michael Schmidt <m.schmidt00@xxxxxx> writes: > I was talking about these benchmarks (JDKs builtin SAX parser). > > http://www.ximpleware.com/benchmark1.html The speed of the SAX benchmark looks suspicious. I have a small Expat-based benchmark that on a 1.8Ghz Opteron gives 35MByte/s throughput. Their benchmark claims about 20MByte/s on a 1.7Ghz Pentium M. This seems a bit too fast especially if you consider that Java SAX API converts UTF-8 to UTF-16 while Expat does not. I did not study their benchmark code in detail, but one thing I noticed is that they do not set any event handlers. This is not very realistic and can be exploited by the parser (for example, the parser may see that there is no characters handler and not transcode the text to UTF-16). As David said, to get any meaningful results you need to make sure you are comparing comparable things. > I'm running Ubuntu Linux 6.06 on a 2.0GHz Dua Core with 1GB RAM > available. I compiled the xerces libraries from source, using > gcc/g++ and optimization level -03. I was just wondering about > this performance difference as my processor is even better as > the one used in the experiments above and they used a Java > system instead of C++. If you have any hints on how to speed > up my system I would be really interested in... In our benchmark[1] we get about 12MByte/s *validating* SAX throughput with Xerces-C++ on 1.8Ghz Opteron. One thing you may want to check is that you have validation disabled since all the Ximpleware benchmarks are non-validating. [1] http://www.codesynthesis.com/projects/xsdbench/ hth, -boris -- Boris Kolpackov Code Synthesis Tools CC http://www.codesynthesis.com Open-Source, Cross-Platform C++ XML Data Binding

Previous Message by Thread: click to view message preview

RE: Re: SAX and DOM

I don't think there's a way to do that. The Xerces parser APIs allow you to specify InputSources, not other parsers. Perhaps there's a way to construct an InputSource from a SAX parser, but if so, it's not obvious to me. Even if it's possible, it would be inefficient; the InputSource would convert have to convert SAX events into a stream that would then be handed to another SAX parser that would then recreate the events. I gather the goal here is to get a DOM representation of an element in a document (including its children, of course) without the overhead of representing the whole document. It's an interesting problem, and one that others have presumably faced. Maybe someone on the list has solved it differently. -----Original Message----- From: news [mailto:news@xxxxxxxxxxxxx] On Behalf Of Patrick Rotsaert Sent: Thursday, March 08, 2007 10:50 AM To: c-users@xxxxxxxxxxxxxxxxx Subject: Re: SAX and DOM Hi Jesse, I also understood that the DOM parses uses SAX internally. But is it possible to 'override' this in some way? What I mean is, once I find the element I am interested in (using SAX), can I create a DOM parser that uses that very same SAX parser instance (instead of creating it's own) and starts parsing at the 'current' SAX element and stops parsing at the end of the element? Thanks, Patrick On 8/03/2007 16:29, Jesse Pelton wrote: > I should think so. The DOM parser creates the entire tree this way; I'd > think you could wait for SAX to present the element you're looking for, > then use standard DOM create...() functions to build your tree from > there. Your parser would need to keep track of its present state (am I > parsing something that needs to go into the DOM, and if so, where in the > tree am I?). > > Of course, DOM is not particularly space-efficient, so a native > representation of the data would be better if that's an option. > > -----Original Message----- > From: news [mailto:news@xxxxxxxxxxxxx] On Behalf Of Patrick Rotsaert > Sent: Thursday, March 08, 2007 10:20 AM > To: c-users@xxxxxxxxxxxxxxxxx > Subject: SAX and DOM > > Hi all, > > Is it possible to parse a XML file using SAX and create a DOM tree of a > part of the file? > > Thanks, > Patrick > >

Next Message by Thread: click to view message preview

delete subscription

Hello, I would like to unsubscribe from this mailing list. Sorry for spamming all of you with this administrative request, but I do not know how to do it otherwise. Best regards, Simon P.
Sign up for updates to this mailing list. email:
Loading Comments...
Home | News | Patents | Sitemap | FAQ | advertise

Advertising by