|
|
Subject: Re: SAX and DOM - msg#00034
List: text.xml.xerces-c.user
You summarized my goal exactly.
Maybe I'll try to write the part of the file to a temporary file and use
the DOM parser for that file. Maybe it is a poor solution, but it is
the only one I can think of right now.
Thanks,
Patrick
On 8/03/2007 17:11, Jesse Pelton wrote:
> I don't think there's a way to do that. The Xerces parser APIs allow
> you to specify InputSources, not other parsers. Perhaps there's a way
> to construct an InputSource from a SAX parser, but if so, it's not
> obvious to me. Even if it's possible, it would be inefficient; the
> InputSource would convert have to convert SAX events into a stream that
> would then be handed to another SAX parser that would then recreate the
> events.
>
> I gather the goal here is to get a DOM representation of an element in a
> document (including its children, of course) without the overhead of
> representing the whole document. It's an interesting problem, and one
> that others have presumably faced. Maybe someone on the list has solved
> it differently.
>
> -----Original Message-----
> From: news [mailto:news@xxxxxxxxxxxxx] On Behalf Of Patrick Rotsaert
> Sent: Thursday, March 08, 2007 10:50 AM
> To: c-users@xxxxxxxxxxxxxxxxx
> Subject: Re: SAX and DOM
>
> Hi Jesse,
>
> I also understood that the DOM parses uses SAX internally. But is it
> possible to 'override' this in some way? What I mean is, once I find the
> element I am interested in (using SAX), can I create a DOM parser that
> uses that very same SAX parser instance (instead of creating it's own)
> and starts parsing at the 'current' SAX element and stops parsing at the
> end of the element?
>
> Thanks,
> Patrick
>
> On 8/03/2007 16:29, Jesse Pelton wrote:
> > I should think so. The DOM parser creates the entire tree this way;
> I'd
> > think you could wait for SAX to present the element you're looking
> for,
> > then use standard DOM create...() functions to build your tree from
> > there. Your parser would need to keep track of its present state (am
> I
> > parsing something that needs to go into the DOM, and if so, where in
> the
> > tree am I?).
> >
> > Of course, DOM is not particularly space-efficient, so a native
> > representation of the data would be better if that's an option.
> >
> > -----Original Message-----
> > From: news [mailto:news@xxxxxxxxxxxxx] On Behalf Of Patrick Rotsaert
> > Sent: Thursday, March 08, 2007 10:20 AM
> > To: c-users@xxxxxxxxxxxxxxxxx
> > Subject: SAX and DOM
> >
> > Hi all,
> >
> > Is it possible to parse a XML file using SAX and create a DOM tree of
> a
> > part of the file?
> >
> > Thanks,
> > Patrick
> >
> >
>
>
Was this page helpful?
Thread at a glance:
Previous Message by Date:
click to view message preview
Re: Xerces Benchmarks
Hi,
> Benchmarks are a very tricky thing. Are those Java parsers all
> full-conformant? How the machines on which those parsers were tested
> differ from yours? I would not trust a benchmark that is not carefully
> designed and controlled.
ok, thanks for this statement. I was talking about these benchmarks (JDKs
builtin SAX parser).
http://www.ximpleware.com/benchmark1.html
(the SAX values, Java SDK
> Xerces-C can be very sensitive to several factors, including the compiler
> used to build the binaries, and the OS memory allocation functions. Since
> you don't mention your OS or compiler, it's hard to say if there's anything
> you can do to get better results.
I'm running Ubuntu Linux 6.06 on a 2.0GHz Dua Core with 1GB RAM available. I
compiled the xerces libraries from source, using gcc/g++ and optimization level
-03. I was just wondering about this performance difference as my processor is
even better as the one used in the experiments above and they used a Java
system instead of C++. If you have any hints on how to speed up my system I
would be really interested in...
Michael
_________________________________________________________________________
In 5 Schritten zur eigenen Homepage. Jetzt Domain sichern und gestalten!
Nur 3,99 EUR/Monat! http://www.maildomain.web.de/?mc=021114
Next Message by Date:
click to view message preview
Re: Xerces Benchmarks
Hi Michael,
Michael Schmidt <m.schmidt00@xxxxxx> writes:
> I was talking about these benchmarks (JDKs builtin SAX parser).
>
> http://www.ximpleware.com/benchmark1.html
The speed of the SAX benchmark looks suspicious. I have a small
Expat-based benchmark that on a 1.8Ghz Opteron gives 35MByte/s
throughput. Their benchmark claims about 20MByte/s on a 1.7Ghz
Pentium M. This seems a bit too fast especially if you consider
that Java SAX API converts UTF-8 to UTF-16 while Expat does not.
I did not study their benchmark code in detail, but one thing
I noticed is that they do not set any event handlers. This is
not very realistic and can be exploited by the parser (for
example, the parser may see that there is no characters handler
and not transcode the text to UTF-16).
As David said, to get any meaningful results you need to make
sure you are comparing comparable things.
> I'm running Ubuntu Linux 6.06 on a 2.0GHz Dua Core with 1GB RAM
> available. I compiled the xerces libraries from source, using
> gcc/g++ and optimization level -03. I was just wondering about
> this performance difference as my processor is even better as
> the one used in the experiments above and they used a Java
> system instead of C++. If you have any hints on how to speed
> up my system I would be really interested in...
In our benchmark[1] we get about 12MByte/s *validating* SAX
throughput with Xerces-C++ on 1.8Ghz Opteron. One thing you
may want to check is that you have validation disabled
since all the Ximpleware benchmarks are non-validating.
[1] http://www.codesynthesis.com/projects/xsdbench/
hth,
-boris
--
Boris Kolpackov
Code Synthesis Tools CC
http://www.codesynthesis.com
Open-Source, Cross-Platform C++ XML Data Binding
Previous Message by Thread:
click to view message preview
RE: Re: SAX and DOM
I don't think there's a way to do that. The Xerces parser APIs allow
you to specify InputSources, not other parsers. Perhaps there's a way
to construct an InputSource from a SAX parser, but if so, it's not
obvious to me. Even if it's possible, it would be inefficient; the
InputSource would convert have to convert SAX events into a stream that
would then be handed to another SAX parser that would then recreate the
events.
I gather the goal here is to get a DOM representation of an element in a
document (including its children, of course) without the overhead of
representing the whole document. It's an interesting problem, and one
that others have presumably faced. Maybe someone on the list has solved
it differently.
-----Original Message-----
From: news [mailto:news@xxxxxxxxxxxxx] On Behalf Of Patrick Rotsaert
Sent: Thursday, March 08, 2007 10:50 AM
To: c-users@xxxxxxxxxxxxxxxxx
Subject: Re: SAX and DOM
Hi Jesse,
I also understood that the DOM parses uses SAX internally. But is it
possible to 'override' this in some way? What I mean is, once I find the
element I am interested in (using SAX), can I create a DOM parser that
uses that very same SAX parser instance (instead of creating it's own)
and starts parsing at the 'current' SAX element and stops parsing at the
end of the element?
Thanks,
Patrick
On 8/03/2007 16:29, Jesse Pelton wrote:
> I should think so. The DOM parser creates the entire tree this way;
I'd
> think you could wait for SAX to present the element you're looking
for,
> then use standard DOM create...() functions to build your tree from
> there. Your parser would need to keep track of its present state (am
I
> parsing something that needs to go into the DOM, and if so, where in
the
> tree am I?).
>
> Of course, DOM is not particularly space-efficient, so a native
> representation of the data would be better if that's an option.
>
> -----Original Message-----
> From: news [mailto:news@xxxxxxxxxxxxx] On Behalf Of Patrick Rotsaert
> Sent: Thursday, March 08, 2007 10:20 AM
> To: c-users@xxxxxxxxxxxxxxxxx
> Subject: SAX and DOM
>
> Hi all,
>
> Is it possible to parse a XML file using SAX and create a DOM tree of
a
> part of the file?
>
> Thanks,
> Patrick
>
>
Next Message by Thread:
click to view message preview
delete subscription
Hello,
I would like to unsubscribe from this mailing list.
Sorry for spamming all of you with this administrative request, but I do
not know how to do it otherwise.
Best regards,
Simon P.
|
|