|
|
Choosing A Webhost: |
Tuning queries on large database: msg#00044db.postgresql.performance
Hi, I have some problem of performance on a PG database, and I don't know how to improve. I Have two questions : one about the storage of data, one about tuning queries. If possible ! My job is to compare Oracle and Postgres. All our operational databases have been running under Oracle for about fifteen years. Now I try to replace Oracle by Postgres. I have a test platform under linux (Dell server, 4 Gb RAM, bi-processor, Linux Red Hat 9 (2.4.20-31.9)) with 2 databases, 1 with Oracle (V8i or V9i it's quite the same), 1 with PG (7.4.2). Both databases have the same structure, same content, about 100 Gb each. I developped some benches, representative of our use of databases. My problem is that I have tables (relations) with more than 100 millions rows, and each row has about 160 fields and an average size 256 bytes. For Oracle I have a SGA size of 500 Mb. For PG I have a postgresql.conf as : max_connections = 1500 shared_buffers = 30000 sort_mem = 50000 effective_cache_size = 200000 and default value for other parameters. I have a table named "data" which looks like this : bench=> \d data Table "public.data" Column | Type | Modifiers ------------+-----------------------------+----------- num_poste | numeric(9,0) | not null dat | timestamp without time zone | not null datrecu | timestamp without time zone | not null rr1 | numeric(5,1) | qrr1 | numeric(2,0) | ... ... all numeric fields ... Indexes: "pk_data" primary key, btree (num_poste, dat) "i_data_dat" btree (dat) It contains 1000 different values of "num_poste" and for each one 125000 different values of "dat" (1 row per hour, 15 years). I run a vacuum analyze of the table. bench=> select * from tailledb ; schema | relfilenode | table | index | reltuples | size --------+-------------+------------------+------------+-------------+---------- public | 125615917 | data | | 1.25113e+08 | 72312040 public | 251139049 | data | i_data_dat | 1.25113e+08 | 2744400 public | 250870177 | data | pk_data | 1.25113e+08 | 4395480 My first remark is that the table takes a lot of place on disk, about 70 Gb, instead of 35 Gb with oracle. 125 000 000 rows x 256 b = about 32 Gb. This calculation gives an idea not so bad for oracle. What about for PG ? How data is stored ? The different queries of the bench are "simple" queries (no join, sub-query, ...) and are using indexes (I "explained" each one to be sure) : Q1 select_court : access to about 700 rows : 1 "num_poste" and 1 month (using PK : num_poste=p1 and dat between p2 and p3) Q2 select_moy : access to about 7000 rows : 10 "num_poste" and 1 month (using PK : num_poste between p1 and p1+10 and dat between p2 and p3) Q3 select_long : about 250 000 rows : 2 "num_poste" (using PK : num_poste in (p1,p1+2)) Q4 select_tres_long : about 3 millions rows : 25 "num_poste" (using PK : num_poste between p1 and p1 + 25) The result is that for "short queries" (Q1 and Q2) it runs in a few seconds on both Oracle and PG. The difference becomes important with Q3 : 8 seconds with oracle 80 sec with PG and too much with Q4 : 28s with oracle 17m20s with PG ! Of course when I run 100 or 1000 parallel queries such as Q3 or Q4, it becomes a disaster ! I can't understand these results. The way to execute queries is the same I think. I've read recommended articles on the PG site. I tried with a table containing 30 millions rows, results are similar. What can I do ? Thanks for your help ! ******************************************************************** * Les points de vue exprimes sont strictement personnels et * * n'engagent pas la responsabilite de METEO-FRANCE. * ******************************************************************** * Valerie SCHNEIDER Tel : +33 (0)5 61 07 81 91 * * METEO-FRANCE / DSI/DEV Fax : +33 (0)5 61 07 81 09 * * 42, avenue G. Coriolis Email : Valerie.Schneider@xxxxxxxx * * 31057 TOULOUSE Cedex - FRANCE http://www.meteo.fr * ******************************************************************** ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | The black art of postgresql.conf tweaking, Paul Serby |
|---|---|
| Next by Date: | Re: The black art of postgresql.conf tweaking, Jeff |
| Previous by Thread: | The black art of postgresql.conf tweaking, Paul Serby |
| Next by Thread: | Re: Tuning queries on large database, Christopher Kings-Lynne |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |