|
eXist performance with distinct-values and nested queries: msg#00250text.xml.exist
I've a query about eXist performance implementing the XQuery equivalent of SQL's SELECT..group by. I loaded 20601 MARCXML records (37 MB raw data) into eXist 30 sept 04 snapshot on a 1.8GHz Pentium/512MB, Windows 2000, Java1.4.2_03 and tried these XQuery scripts: 1) Simple search for records containing "television" anywhere declare namespace m="http://www.loc.gov/MARC21/slim"; let $foundSet := //m:record[* &= 'Television'] return $foundSet - 21 records returned instantly (0.0x sec) 2) ditto, but get the different (distinct) institution codes These records came from (the NUC code which is in MARC Tag 040, subfield a): declare namespace m="http://www.loc.gov/MARC21/slim"; let $foundSet := //m:record[* &= 'Television'] for $nucs in distinct-values($foundSet//m:datafield[@tag ="040"]/m:subfield[@code="a"]) return <nuc> <code>{$nucs}</code> </nuc> - find 12 in about 3 secs (hmm.. Why? It only has to search 21 records!!) 3) ditto, but find out how many for each institution - similar to an SQL count(*)..group by: declare namespace m="http://www.loc.gov/MARC21/slim"; let $foundSet := //m:record[* &= 'Television'] for $nucs in distinct-values($foundSet//m:datafield[@tag ="040"]/m:subfield[@code="a"]) let $x := $foundSet//m:datafield[@tag ="040"]/m:subfield[@code="a"][. &= $nucs] return <nuc> <code>{$nucs}</code> <count> {fn:count($x)} </count> </nuc> - finds 12 in 14.6 secs with this being logged: 30 Sep 2004 01:18:14,939 [SocketListener-6] DEBUG (EvalFunction.java [eval]:100) - eval: declare namespace m="http://www.loc.gov/MARC21/slim"; let $foundSet := //m:record[* &= 'Television'] for $nucs in distinct-values($foundSet//m:datafield[@tag ="040"]/m:subfield[@code="a"]) let $x := $foundSet//m:datafield[@tag ="040"]/m:subfield[@code="a"][. = $nucs] return <nuc> <code>{$nucs}</code> <count> {fn:count($x)} </count> </nuc> 30 Sep 2004 01:18:15,939 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 30 Sep 2004 01:18:17,220 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 30 Sep 2004 01:18:18,783 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 30 Sep 2004 01:18:19,908 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 30 Sep 2004 01:18:20,548 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 9 30 Sep 2004 01:18:20,611 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 30 Sep 2004 01:18:20,861 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 30 Sep 2004 01:18:21,376 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 4 30 Sep 2004 01:18:21,376 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 30 Sep 2004 01:18:21,658 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 30 Sep 2004 01:18:22,158 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 7 30 Sep 2004 01:18:22,251 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 30 Sep 2004 01:18:22,501 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 30 Sep 2004 01:18:23,017 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 4 30 Sep 2004 01:18:23,033 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 30 Sep 2004 01:18:23,283 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 30 Sep 2004 01:18:23,814 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 30 Sep 2004 01:18:23,814 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 30 Sep 2004 01:18:24,080 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 30 Sep 2004 01:18:24,595 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 9 30 Sep 2004 01:18:24,658 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 30 Sep 2004 01:18:24,923 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 30 Sep 2004 01:18:25,439 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 4 30 Sep 2004 01:18:25,455 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 30 Sep 2004 01:18:25,705 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 30 Sep 2004 01:18:26,220 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 4 30 Sep 2004 01:18:26,251 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 30 Sep 2004 01:18:26,517 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 30 Sep 2004 01:18:27,033 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 6 30 Sep 2004 01:18:27,033 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 30 Sep 2004 01:18:27,298 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 30 Sep 2004 01:18:27,814 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 4 30 Sep 2004 01:18:27,814 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 30 Sep 2004 01:18:28,064 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 30 Sep 2004 01:18:28,611 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 9 30 Sep 2004 01:18:28,673 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 30 Sep 2004 01:18:28,939 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 30 Sep 2004 01:18:29,455 [SocketListener-6] DEBUG (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 11 30 Sep 2004 01:18:29,533 [SocketListener-6] DEBUG (EvalFunction.java [eval]:123) - Found 12 for declare namespace m="http://www.loc.gov/MARC21/slim"; let $foundSet := //m:record[* &= 'Television'] for $nucs in distinct-values($foundSet//m:datafield[@tag ="040"]/m:subfield[@code="a"]) let $x := $foundSet//m:datafield[@tag ="040"]/m:subfield[@code="a"][. = $nucs] return <nuc> <code>{$nucs}</code> <count> {fn:count($x)} </count> </nuc> 30 Sep 2004 01:18:29,533 [SocketListener-6] DEBUG (EvalFunction.java [eval]:124) - Query took 14578 30 Sep 2004 01:18:29,548 [SocketListener-6] DEBUG (LocalXPathQueryService.java [execute]:213) - query took 14609 ms. Reformulating the query more like the books/authors example declare namespace m="http://www.loc.gov/MARC21/slim"; for $nucs in distinct-values(//m:datafield[@tag ="040"]/m:subfield[@code="a"]) let $x := //m:record[* &= 'Television'][ m:datafield[@tag ="040"]/m:subfield[@code="a"][. &= $nucs]] return <nuc> <code>{$nucs}</code> <count> {fn:count($x)} </count> </nuc> takes over 140 secs, presumabley because $x is calculated for each NUC, many of which don't have matching "Television" records. Reformulating slightly to take the "Television" search out of the loop: let $foundSet := //m:record[* &= 'Television'] for $nucs in distinct-values(//m:datafield[@tag ="040"]/m:subfield[@code="a"]) let $x := $foundSet//m:datafield[@tag ="040"]/m:subfield[@code="a"][. &= $nucs] return <nuc> <code>{$nucs}</code> <count> {fn:count($x)} </count> </nuc> takes almost as long. A close variant on the initial query which looks for the NUC _anywhere_ (not just the "a" subfield of the "040" tag) is much quicker at just 3.5 secs, but potentially inaccurate: declare namespace m="http://www.loc.gov/MARC21/slim"; let $foundSet := //m:record[* &= 'Television'] for $nucs in distinct-values($foundSet//m:datafield[@tag ="040"]/m:subfield[@code="a"]) let $x := $foundSet//*[. &= $nucs] return <nuc> <code>{$nucs}</code> <count> {fn:count($x)} </count> </nuc> Has anyone else had problems/success in optimizing nested queries? Regards, Kent Fitch ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: Character encodings: 00250, Luigi Bai |
|---|---|
| Next by Date: | latest snapshot and rest protocol: 00250, Sava Jurisic |
| Previous by Thread: | What is the fastest way to access eXist from Pythoni: 00250, Max Ischenko |
| Next by Thread: | Re: eXist performance with distinct-values and nested queries: 00250, Wolfgang Meier |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |