|
|
Subject: Index perfomance - msg#00135
List: derby-user-db-apache
I found an interesting preformace problem that I'd welcome some help in
inderstanding.
I have a table of 4,000,000 rows that has a DOUBLE column containing
floating point numbers.
I run a query like this:
select pk_column from the_table where the_double_column > 300 and
the_double_column < 320
271136 rows are returned.
I then go through the ResultSet and extract all the id_column values.
All of this is done using standard JDBC.
When I do this it takes 23 seconds, which I though was not unreasonable
as a full table scan was involved and the table was pretty big.
But of course I thought an index would help, so I added an index to the
the_double_column and repeated the process. It took 720 seconds, 31x
slower! I thought this was strange, but thought it might be because I
was using standard settings and the 4MB page cache was much too small to
hold the index.
So I increased the page cache size (derby.storage.pageCacheSize
property) to 10x the size (10,000) and repeated the process. There was
only a very minor improvement in speed.
In all cases the memory usage, as reported by:
Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()
really no differnt, and the used memory was much less that the maximum
available specified by the -Xmx setting.
Any ideas what to do?
Tim
Was this page helpful?
Thread at a glance:
Previous Message by Date:
click to view message preview
Re: ClientDataSource in OSGi
After further investigation it appears the the derbyclient.jar is not a
complete bundle. The manifest is terse and does not export anything. I
grabbed the spring source version of the derby bundles and their
derbyclient manifest is quite detailed including the correct export
package. I am able to use it without issue.
JJ
Kristian Waagan wrote:
JJ Snyder wrote:
Hi,
I am running Equinox standalone and have installed and started the
derby bundle (derby.jar). I am able to access the EmbeddedDataSource
from another bundle successfully. I then install and start the
derbyclient.jar as a bundle but when I try to access the
ClientDataSource I get a ClassNotFoundException. It works fine in
eclipse' PDE environment but not standalone.
Any ideas?
Hello,
I don't have much to offer now, but can you please post the stack trace?
Is it the ClientDataSource class that isn't found, or a different class?
Regards,
Next Message by Date:
click to view message preview
Re: ClientDataSource in OSGi
JJ Snyder wrote:
After further investigation it appears the the derbyclient.jar is not
a complete bundle. The manifest is terse and does not export
anything. I grabbed the spring source version of the derby bundles
and their derbyclient manifest is quite detailed including the correct
export package. I am able to use it without issue.
Good to hear that you have been able to work around the issue.
I don't know OSGi very well. Would you say that the Derby bundle is
broken, and needs to be fixed? Or does it require some extra
configuration to work?
If you think this is a Derby bug, feel free to log a Jira issue at
https://issues.apache.org/jira/browse/DERBY
Thank you for your feedback!
Regards,
--
Kristian
JJ
Kristian Waagan wrote:
JJ Snyder wrote:
Hi,
I am running Equinox standalone and have installed and started the
derby bundle (derby.jar). I am able to access the EmbeddedDataSource
from another bundle successfully. I then install and start the
derbyclient.jar as a bundle but when I try to access the
ClientDataSource I get a ClassNotFoundException. It works fine in
eclipse' PDE environment but not standalone.
Any ideas?
Hello,
I don't have much to offer now, but can you please post the stack trace?
Is it the ClientDataSource class that isn't found, or a different class?
Regards,
Previous Message by Thread:
click to view message preview
Re: Most efficient way to get max row id?
Thank you Rick!
That was really helpful!
I need to use option 2, since i cannot retrieve automatically generated
identity numbers on a multi-columns insert
(https://issues.apache.org/jira/browse/DERBY-3609?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel).
that select you posted is exactly what I need in order to maintain the
identities myself for the time being, starting with the last one (+ inc
size) every time the program restarts (or the connection was lost).
Rick Hillegas-2 wrote:
>
> Hi Amir,
>
> 1) If what you need to know is the max value on disk right now, then you
> will need to issue a "select max(...)" statement.
>
> 2) Maybe, however, all you need to know is the max value that was ever
> on disk. Of course, this can be a different number than (1) because you
> may have deleted rows at the upper end.
>
> If all you need to know is (2), then you can get the answer from the
> system catalogs. Here's a little script which creates a table with an
> identity column, inserts and deletes some rows, and then selects the max
> value that was ever in the identity column. The last query gives you the
> answer to (2):
>
> drop table s;
> create table s
> (
> a int generated always as identity (start with 3, increment by 3),
> b int
> );
>
> insert into s( b ) values ( 1 ), ( 2 ), ( 3 ), ( 4 );
> delete from s where b = 4;
> insert into s( b ) values ( 11 ), ( 12 ), ( 13 ), ( 14 );
>
> select * from s order by b;
>
> -- now find the max value that was ever put in the identity column
> select c.autoincrementvalue - c.autoincrementinc
> from sys.syscolumns c, sys.systables t
> where t.tablename = 'S'
> and t.tableid = c.referenceid
> and c.columnname = 'A';
>
> For more information, please see the Reference Guide section describing
> the SYSCOLUMNS system catalog.
>
> Hope this helps,
> -Rick
>
> Amir Michail wrote:
>> Hi,
>>
>> I was wondering what is the most efficient way to get the max
>> automatically generated row id.
>>
>> select max(...) is slow.
>>
>> Amir
>>
>>
>
>
>
--
View this message in context:
http://www.nabble.com/Most-efficient-way-to-get-max-row-id--tp20060117p22700688.html
Sent from the Apache Derby Users mailing list archive at Nabble.com.
Next Message by Thread:
click to view message preview
Re: Index perfomance
Tim Dudgeon wrote:
I found an interesting preformace problem that I'd welcome some help
in inderstanding.
I have a table of 4,000,000 rows that has a DOUBLE column containing
floating point numbers.
I run a query like this:
select pk_column from the_table where the_double_column > 300 and
the_double_column < 320
271136 rows are returned.
I then go through the ResultSet and extract all the id_column values.
All of this is done using standard JDBC.
You are selecting about 7% of the rows. If the double column values are
independent of the pk_column and a reasonable number of rows fit on a
page then you are likely to have at least one desired row on most pages
in the table. Without an index, the scan will proceed in the most
convenient order through all the pages. With an index, it will still hit
most of the pages, but in the most inconvenient order (plus of course
having to read/decode the index pages).
Mark Thornton
|
|