logo       

Re: Release of the GCX XQuery EngineQ: msg#00004

Subject: Re: Release of the GCX XQuery EngineQ
Hi Frans,

thanks for your interest in our XQuery engine, and thanks for your feedback.

What made you think that comparing your implementation to the others was
reasonable?

Actually, it is not so easy to get a hold of reference implementations
and we had to make do with what is publicly available.  GCX has two
main characteristics: It's an in-memory XQuery engine and it is geared
towards  streaming XQuery evaluation.

The FluXQuery engine is the most natural choice for a reference,
because it is also a main-memory XQuery engine geared towards XML
stream processing, and it implements a very similar XQuery fragment.
There are other streaming resaearch prototypes (e.g. XSM), but they
typically have not been released by their makers yet.

The other in-memory engines (QizX, Galax, and Saxon) implement more
XQuery features (or all of them), but they are not geared towards
stream processing. But at least the principal architecture is
comparable.

Finally, we chose MonetDB out of pure interest on how we would perform
in comparison. As ours is a streaming engine,  comparing it against a
secondary-storage implementation that can make use of index structures
etc. in a different way is unfair to us.

Unfortunately, no other streaming XQuery implementations are to be had
to be compared against. However, if you know of any suitable
implementations, I'd appreciate it very much if you could point us to
them.


I'm asking because I see it as comparing apples against oranges. Some of the
other products run the queries as is and they implement XQuery, which is
quite different from not implementing XQuery and rewriting queries to ones
liking.

I hope there is no misunderstandment - of course, we first rewrote the
queries as shown on the website, and then ran the same queries on the
same data on each engine.

Measuiring memory usage with top is as far as I know generally adviced
against. See:

http://ktown.kde.org/~seli/memory/analysis.html

Thanks for the link - however, when GCX needs only a little more than
1 MB main memory
for some same query where others require over a hundred MBs, then I
think a point has been made.

So the short story to why you get such a low memory foot prints is that you
don't load more of the document than is needed, as told my static
analysis("roles")?


There are two key approches: First, document projection where we try
to load only what may be needed for query evaluation. This, of course,
has been done before (e.g. the Galax people experimented with it,
too). Second, and this is new, the garbage collector removes the
loaded data once it is not needed anymore. This is done continually,
and for many queries it works out very nicely such that only a small
subset of the input is kept in main memory at any moment in time
during query evaluation.

If you are interested in the internals, maybe you'll want to check out
the paper on GCX:
http://www.infosys.uni-sb.de/publications/INFOSYS-TR-2006-13.pdf

Ciao,
Steffi


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
region.costa-ri...    user-groups.lin...    lang.prothon.us...    xfree86.cvs/200...    finance.aqbanki...    ietf.smtp/1993-...    web.turbogears....    jakarta.cactus....    yellowdog.gener...    php.gtk+.genera...    org.region.indo...    hurd.l4/2005-10...    culture.religio...    apple.fink.gene...    db.mysql.window...    bluez.devel/200...    security.pgp-ba...    video.blender.d...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe