|
Re: eXist performance with distinct-values and nested queries: msg#00254text.xml.exist
Hi, thanks for the example. I have been trying similar queries and found that one optimization I added some time ago is actually not an optimization, but makes things slow instead. I have to think a bit about how to solve this, so please stand by. BTW: I tried to convert some MARC records to MARCXML (using marc4j), but the conversion failed repeatedly, so I gave up. Would it be possible to get your data for testing or have you downloaded it from somewhere else? Wolfgang On Thursday 30 September 2004 04:25 am, Kent Fitch wrote: > I've a query about eXist performance implementing the > XQuery equivalent of SQL's SELECT..group by. > > I loaded 20601 MARCXML records (37 MB raw data) into eXist > 30 sept 04 snapshot on a 1.8GHz Pentium/512MB, Windows 2000, > Java1.4.2_03 and tried these XQuery scripts: > > 1) Simple search for records containing "television" anywhere > > declare namespace m="http://www.loc.gov/MARC21/slim"; > let $foundSet := //m:record[* &= 'Television'] > return $foundSet > > - 21 records returned instantly (0.0x sec) > > 2) ditto, but get the different (distinct) institution codes > These records came from (the NUC code which is in MARC > Tag 040, subfield a): > > declare namespace m="http://www.loc.gov/MARC21/slim"; > let $foundSet := //m:record[* &= 'Television'] > for $nucs in distinct-values($foundSet//m:datafield[@tag > ="040"]/m:subfield[@code="a"]) > return > <nuc> > <code>{$nucs}</code> > </nuc> > > - find 12 in about 3 secs (hmm.. Why? It only has to > search 21 records!!) > > 3) ditto, but find out how many for each institution - similar to an > SQL count(*)..group by: > > declare namespace m="http://www.loc.gov/MARC21/slim"; > let $foundSet := //m:record[* &= 'Television'] > for $nucs in distinct-values($foundSet//m:datafield[@tag > ="040"]/m:subfield[@code="a"]) > let $x := $foundSet//m:datafield[@tag ="040"]/m:subfield[@code="a"][. &= > $nucs] return > <nuc> > <code>{$nucs}</code> > <count> {fn:count($x)} </count> > </nuc> > > - finds 12 in 14.6 secs with this being logged: > > > 30 Sep 2004 01:18:14,939 [SocketListener-6] DEBUG (EvalFunction.java > [eval]:100) - eval: declare namespace > m="http://www.loc.gov/MARC21/slim"; > let $foundSet := //m:record[* &= 'Television'] > for $nucs in distinct-values($foundSet//m:datafield[@tag > ="040"]/m:subfield[@code="a"]) > let $x := $foundSet//m:datafield[@tag ="040"]/m:subfield[@code="a"][. = > $nucs] return > <nuc> > <code>{$nucs}</code> > <count> {fn:count($x)} </count> > </nuc> > 30 Sep 2004 01:18:15,939 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 > 30 Sep 2004 01:18:17,220 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 > 30 Sep 2004 01:18:18,783 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 > 30 Sep 2004 01:18:19,908 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 > 30 Sep 2004 01:18:20,548 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 9 > 30 Sep 2004 01:18:20,611 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 > 30 Sep 2004 01:18:20,861 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 > 30 Sep 2004 01:18:21,376 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 4 > 30 Sep 2004 01:18:21,376 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 > 30 Sep 2004 01:18:21,658 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 > 30 Sep 2004 01:18:22,158 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 7 > 30 Sep 2004 01:18:22,251 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 > 30 Sep 2004 01:18:22,501 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 > 30 Sep 2004 01:18:23,017 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 4 > 30 Sep 2004 01:18:23,033 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 > 30 Sep 2004 01:18:23,283 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 > 30 Sep 2004 01:18:23,814 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 > 30 Sep 2004 01:18:23,814 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 > 30 Sep 2004 01:18:24,080 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 > 30 Sep 2004 01:18:24,595 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 9 > 30 Sep 2004 01:18:24,658 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 > 30 Sep 2004 01:18:24,923 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 > 30 Sep 2004 01:18:25,439 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 4 > 30 Sep 2004 01:18:25,455 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 > 30 Sep 2004 01:18:25,705 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 > 30 Sep 2004 01:18:26,220 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 4 > 30 Sep 2004 01:18:26,251 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 > 30 Sep 2004 01:18:26,517 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 > 30 Sep 2004 01:18:27,033 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 6 > 30 Sep 2004 01:18:27,033 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 > 30 Sep 2004 01:18:27,298 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 > 30 Sep 2004 01:18:27,814 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 4 > 30 Sep 2004 01:18:27,814 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 > 30 Sep 2004 01:18:28,064 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 > 30 Sep 2004 01:18:28,611 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 9 > 30 Sep 2004 01:18:28,673 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 3 > 30 Sep 2004 01:18:28,939 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 1 > 30 Sep 2004 01:18:29,455 [SocketListener-6] DEBUG > (GeneralComparison.java [quickNodeSetCompare]:245) - quick compare: 11 > 30 Sep 2004 01:18:29,533 [SocketListener-6] DEBUG (EvalFunction.java > [eval]:123) - Found 12 for declare namespace > m="http://www.loc.gov/MARC21/slim"; let $foundSet := //m:record[* &= > 'Television'] > for $nucs in distinct-values($foundSet//m:datafield[@tag > ="040"]/m:subfield[@code="a"]) > let $x := $foundSet//m:datafield[@tag ="040"]/m:subfield[@code="a"][. = > $nucs] return > <nuc> > <code>{$nucs}</code> > <count> {fn:count($x)} </count> > </nuc> > 30 Sep 2004 01:18:29,533 [SocketListener-6] DEBUG (EvalFunction.java > [eval]:124) - Query took 14578 > 30 Sep 2004 01:18:29,548 [SocketListener-6] DEBUG > (LocalXPathQueryService.java [execute]:213) - query took 14609 ms. > > > Reformulating the query more like the books/authors example > > declare namespace m="http://www.loc.gov/MARC21/slim"; > > for $nucs in distinct-values(//m:datafield[@tag > ="040"]/m:subfield[@code="a"]) let $x := //m:record[* &= 'Television'][ > m:datafield[@tag > ="040"]/m:subfield[@code="a"][. &= $nucs]] > return > <nuc> > <code>{$nucs}</code> > <count> {fn:count($x)} </count> > </nuc> > > takes over 140 secs, presumabley because $x is calculated for each NUC, > many of which don't have matching "Television" records. > > Reformulating slightly to take the "Television" search out of the loop: > > let $foundSet := //m:record[* &= 'Television'] > for $nucs in distinct-values(//m:datafield[@tag > ="040"]/m:subfield[@code="a"]) let $x := $foundSet//m:datafield[@tag > ="040"]/m:subfield[@code="a"][. &= $nucs] return > <nuc> > <code>{$nucs}</code> > <count> {fn:count($x)} </count> > </nuc> > > takes almost as long. > > > A close variant on the initial query which looks for the NUC > _anywhere_ (not just > the "a" subfield of the "040" tag) is much quicker at just 3.5 secs, > but potentially > inaccurate: > > declare namespace m="http://www.loc.gov/MARC21/slim"; > let $foundSet := //m:record[* &= 'Television'] > for $nucs in distinct-values($foundSet//m:datafield[@tag > ="040"]/m:subfield[@code="a"]) > let $x := $foundSet//*[. &= $nucs] > return > <nuc> > <code>{$nucs}</code> > <count> {fn:count($x)} </count> > </nuc> > > Has anyone else had problems/success in optimizing nested queries? ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | result set unknown or timeout: 00254, ORTEGA Julien (Consultas Lausanne) |
|---|---|
| Next by Date: | Re: latest snapshot and rest protocol: 00254, Wolfgang Meier |
| Previous by Thread: | eXist performance with distinct-values and nested queriesi: 00254, Kent Fitch |
| Next by Thread: | latest snapshot and rest protocol: 00254, Sava Jurisic |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |