logo       

eXist performance with distinct-values and nested queries: msg#00250

text.xml.exist

Subject: eXist performance with distinct-values and nested queries

I've a query about eXist performance implementing the
XQuery equivalent of SQL's SELECT..group by.

I loaded 20601 MARCXML records (37 MB raw data) into eXist
30 sept 04 snapshot on a 1.8GHz Pentium/512MB, Windows 2000,
Java1.4.2_03 and tried these XQuery scripts:

1) Simple search for records containing "television" anywhere

declare namespace m="http://www.loc.gov/MARC21/slim";;
let $foundSet := //m:record[* &= 'Television']
return $foundSet

- 21 records returned instantly (0.0x sec)

2) ditto, but get the different (distinct) institution codes
These records came from (the NUC code which is in MARC
Tag 040, subfield a):

declare namespace m="http://www.loc.gov/MARC21/slim";;
let $foundSet := //m:record[* &= 'Television']
for $nucs in distinct-values($foundSet//m:datafield[@tag
="040"]/m:subfield[@code="a"])
return
<nuc>
<code>{$nucs}</code>
</nuc>

- find 12 in about 3 secs (hmm.. Why? It only has to
search 21 records!!)

3) ditto, but find out how many for each institution - similar to an
SQL count(*)..group by:

declare namespace m="http://www.loc.gov/MARC21/slim";;
let $foundSet := //m:record[* &= 'Television']
for $nucs in distinct-values($foundSet//m:datafield[@tag
="040"]/m:subfield[@code="a"])
let $x := $foundSet//m:datafield[@tag ="040"]/m:subfield[@code="a"][. &= $nucs]
return
<nuc>
<code>{$nucs}</code>
<count> {fn:count($x)} </count>
</nuc>

- finds 12 in 14.6 secs with this being logged:


30 Sep 2004 01:18:14,939 [SocketListener-6] DEBUG (EvalFunction.java
[eval]:100) - eval: declare namespace
m="http://www.loc.gov/MARC21/slim";;
let $foundSet := //m:record[* &= 'Television']
for $nucs in distinct-values($foundSet//m:datafield[@tag
="040"]/m:subfield[@code="a"])
let $x := $foundSet//m:datafield[@tag ="040"]/m:subfield[@code="a"][. = $nucs]
return
<nuc>
<code>{$nucs}</code>
<count> {fn:count($x)} </count>
</nuc>
30 Sep 2004 01:18:15,939 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3
30 Sep 2004 01:18:17,220 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1
30 Sep 2004 01:18:18,783 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3
30 Sep 2004 01:18:19,908 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1
30 Sep 2004 01:18:20,548 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 9
30 Sep 2004 01:18:20,611 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3
30 Sep 2004 01:18:20,861 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1
30 Sep 2004 01:18:21,376 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 4
30 Sep 2004 01:18:21,376 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3
30 Sep 2004 01:18:21,658 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1
30 Sep 2004 01:18:22,158 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 7
30 Sep 2004 01:18:22,251 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3
30 Sep 2004 01:18:22,501 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1
30 Sep 2004 01:18:23,017 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 4
30 Sep 2004 01:18:23,033 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3
30 Sep 2004 01:18:23,283 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1
30 Sep 2004 01:18:23,814 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3
30 Sep 2004 01:18:23,814 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3
30 Sep 2004 01:18:24,080 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1
30 Sep 2004 01:18:24,595 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 9
30 Sep 2004 01:18:24,658 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3
30 Sep 2004 01:18:24,923 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1
30 Sep 2004 01:18:25,439 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 4
30 Sep 2004 01:18:25,455 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3
30 Sep 2004 01:18:25,705 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1
30 Sep 2004 01:18:26,220 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 4
30 Sep 2004 01:18:26,251 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3
30 Sep 2004 01:18:26,517 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1
30 Sep 2004 01:18:27,033 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 6
30 Sep 2004 01:18:27,033 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3
30 Sep 2004 01:18:27,298 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1
30 Sep 2004 01:18:27,814 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 4
30 Sep 2004 01:18:27,814 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3
30 Sep 2004 01:18:28,064 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1
30 Sep 2004 01:18:28,611 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 9
30 Sep 2004 01:18:28,673 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3
30 Sep 2004 01:18:28,939 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1
30 Sep 2004 01:18:29,455 [SocketListener-6] DEBUG
(GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 11
30 Sep 2004 01:18:29,533 [SocketListener-6] DEBUG (EvalFunction.java [eval]:123)
- Found 12 for declare namespace m="http://www.loc.gov/MARC21/slim";;
let $foundSet := //m:record[* &= 'Television']
for $nucs in distinct-values($foundSet//m:datafield[@tag
="040"]/m:subfield[@code="a"])
let $x := $foundSet//m:datafield[@tag ="040"]/m:subfield[@code="a"][. = $nucs]
return
<nuc>
<code>{$nucs}</code>
<count> {fn:count($x)} </count>
</nuc>
30 Sep 2004 01:18:29,533 [SocketListener-6] DEBUG (EvalFunction.java
[eval]:124) - Query took 14578
30 Sep 2004 01:18:29,548 [SocketListener-6] DEBUG
(LocalXPathQueryService.java [execute]:213) - query took 14609 ms.


Reformulating the query more like the books/authors example

declare namespace m="http://www.loc.gov/MARC21/slim";;

for $nucs in distinct-values(//m:datafield[@tag ="040"]/m:subfield[@code="a"])
let $x := //m:record[* &= 'Television'][ m:datafield[@tag
="040"]/m:subfield[@code="a"][. &= $nucs]]
return
<nuc>
<code>{$nucs}</code>
<count> {fn:count($x)} </count>
</nuc>

takes over 140 secs, presumabley because $x is calculated for each NUC, many
of which don't have matching "Television" records.

Reformulating slightly to take the "Television" search out of the loop:

let $foundSet := //m:record[* &= 'Television']
for $nucs in distinct-values(//m:datafield[@tag ="040"]/m:subfield[@code="a"])
let $x := $foundSet//m:datafield[@tag ="040"]/m:subfield[@code="a"][. &= $nucs]
return
<nuc>
<code>{$nucs}</code>
<count> {fn:count($x)} </count>
</nuc>

takes almost as long.


A close variant on the initial query which looks for the NUC
_anywhere_ (not just
the "a" subfield of the "040" tag) is much quicker at just 3.5 secs,
but potentially
inaccurate:

declare namespace m="http://www.loc.gov/MARC21/slim";;
let $foundSet := //m:record[* &= 'Television']
for $nucs in distinct-values($foundSet//m:datafield[@tag
="040"]/m:subfield[@code="a"])
let $x := $foundSet//*[. &= $nucs]
return
<nuc>
<code>{$nucs}</code>
<count> {fn:count($x)} </count>
</nuc>

Has anyone else had problems/success in optimizing nested queries?

Regards,

Kent Fitch


-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise