osdir.com
mailing list archive

Subject: Index perfomance - msg#00135

List: derby-user-db-apache

Date: Prev Next Index Thread: Prev Next Index
I found an interesting preformace problem that I'd welcome some help in inderstanding.

I have a table of 4,000,000 rows that has a DOUBLE column containing floating point numbers.
I run a query like this:

select pk_column from the_table where the_double_column > 300 and the_double_column < 320

271136 rows are returned.
I then go through the ResultSet and extract all the id_column values.
All of this is done using standard JDBC.

When I do this it takes 23 seconds, which I though was not unreasonable as a full table scan was involved and the table was pretty big.

But of course I thought an index would help, so I added an index to the the_double_column and repeated the process. It took 720 seconds, 31x slower! I thought this was strange, but thought it might be because I was using standard settings and the 4MB page cache was much too small to hold the index.

So I increased the page cache size (derby.storage.pageCacheSize property) to 10x the size (10,000) and repeated the process. There was only a very minor improvement in speed.

In all cases the memory usage, as reported by:
Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()
really no differnt, and the used memory was much less that the maximum available specified by the -Xmx setting.


Any ideas what to do?

Tim

Was this page helpful?
Yes No
Thread at a glance:

Previous Message by Date: click to view message preview

Re: ClientDataSource in OSGi

After further investigation it appears the the derbyclient.jar is not a complete bundle. The manifest is terse and does not export anything. I grabbed the spring source version of the derby bundles and their derbyclient manifest is quite detailed including the correct export package. I am able to use it without issue. JJ Kristian Waagan wrote: JJ Snyder wrote: Hi, I am running Equinox standalone and have installed and started the derby bundle (derby.jar). I am able to access the EmbeddedDataSource from another bundle successfully. I then install and start the derbyclient.jar as a bundle but when I try to access the ClientDataSource I get a ClassNotFoundException. It works fine in eclipse' PDE environment but not standalone. Any ideas? Hello, I don't have much to offer now, but can you please post the stack trace? Is it the ClientDataSource class that isn't found, or a different class? Regards,

Next Message by Date: click to view message preview

Re: ClientDataSource in OSGi

JJ Snyder wrote: After further investigation it appears the the derbyclient.jar is not a complete bundle. The manifest is terse and does not export anything. I grabbed the spring source version of the derby bundles and their derbyclient manifest is quite detailed including the correct export package. I am able to use it without issue. Good to hear that you have been able to work around the issue. I don't know OSGi very well. Would you say that the Derby bundle is broken, and needs to be fixed? Or does it require some extra configuration to work? If you think this is a Derby bug, feel free to log a Jira issue at https://issues.apache.org/jira/browse/DERBY Thank you for your feedback! Regards, -- Kristian JJ Kristian Waagan wrote: JJ Snyder wrote: Hi, I am running Equinox standalone and have installed and started the derby bundle (derby.jar). I am able to access the EmbeddedDataSource from another bundle successfully. I then install and start the derbyclient.jar as a bundle but when I try to access the ClientDataSource I get a ClassNotFoundException. It works fine in eclipse' PDE environment but not standalone. Any ideas? Hello, I don't have much to offer now, but can you please post the stack trace? Is it the ClientDataSource class that isn't found, or a different class? Regards,

Previous Message by Thread: click to view message preview

Re: Most efficient way to get max row id?

Thank you Rick! That was really helpful! I need to use option 2, since i cannot retrieve automatically generated identity numbers on a multi-columns insert (https://issues.apache.org/jira/browse/DERBY-3609?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel). that select you posted is exactly what I need in order to maintain the identities myself for the time being, starting with the last one (+ inc size) every time the program restarts (or the connection was lost). Rick Hillegas-2 wrote: > > Hi Amir, > > 1) If what you need to know is the max value on disk right now, then you > will need to issue a "select max(...)" statement. > > 2) Maybe, however, all you need to know is the max value that was ever > on disk. Of course, this can be a different number than (1) because you > may have deleted rows at the upper end. > > If all you need to know is (2), then you can get the answer from the > system catalogs. Here's a little script which creates a table with an > identity column, inserts and deletes some rows, and then selects the max > value that was ever in the identity column. The last query gives you the > answer to (2): > > drop table s; > create table s > ( > a int generated always as identity (start with 3, increment by 3), > b int > ); > > insert into s( b ) values ( 1 ), ( 2 ), ( 3 ), ( 4 ); > delete from s where b = 4; > insert into s( b ) values ( 11 ), ( 12 ), ( 13 ), ( 14 ); > > select * from s order by b; > > -- now find the max value that was ever put in the identity column > select c.autoincrementvalue - c.autoincrementinc > from sys.syscolumns c, sys.systables t > where t.tablename = 'S' > and t.tableid = c.referenceid > and c.columnname = 'A'; > > For more information, please see the Reference Guide section describing > the SYSCOLUMNS system catalog. > > Hope this helps, > -Rick > > Amir Michail wrote: >> Hi, >> >> I was wondering what is the most efficient way to get the max >> automatically generated row id. >> >> select max(...) is slow. >> >> Amir >> >> > > > -- View this message in context: http://www.nabble.com/Most-efficient-way-to-get-max-row-id--tp20060117p22700688.html Sent from the Apache Derby Users mailing list archive at Nabble.com.

Next Message by Thread: click to view message preview

Re: Index perfomance

Tim Dudgeon wrote: I found an interesting preformace problem that I'd welcome some help in inderstanding. I have a table of 4,000,000 rows that has a DOUBLE column containing floating point numbers. I run a query like this: select pk_column from the_table where the_double_column > 300 and the_double_column < 320 271136 rows are returned. I then go through the ResultSet and extract all the id_column values. All of this is done using standard JDBC. You are selecting about 7% of the rows. If the double column values are independent of the pk_column and a reasonable number of rows fit on a page then you are likely to have at least one desired row on most pages in the table. Without an index, the scan will proceed in the most convenient order through all the pages. With an index, it will still hit most of the pages, but in the most inconvenient order (plus of course having to read/decode the index pages). Mark Thornton
Sign up for updates to this mailing list. email:
Loading Comments...
Home | News | Patents | Sitemap | FAQ | advertise

Advertising by