osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[oslo][nova] Nova causes MySQL timeouts


On Tue, 2019-09-17 at 16:36 +0000, Albert Braden wrote:
> I thought I had figured out that the solution was to increase the MySQL wait_timeout so that it is longer than the
> nova (and glance, neutron, etc.) connection_recycle_time (3600). I increased my MySQL wait_timeout to 6000:
> 
> root at us01odc-qa-ctrl1:~# mysqladmin variables|grep wait_timeout|grep -v _wait
> > wait_timeout                                           | 6000         
> 
> But I still see the MySQL errors. There's no LB; we are pointing to a single MySQL host. 
> 
> Sep 11 14:59:56 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:56 8016 [Warning] Aborted connection 8016 to db:
> 'nova' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets)
> Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8019 [Warning] Aborted connection 8019 to db:
> 'glance' user: 'glance' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets)
> Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8018 [Warning] Aborted connection 8018 to db:
> 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets)
> Sep 11 15:00:50 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 15:00:50 8022 [Warning] Aborted connection 8022 to db:
> 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets)
> 
> The errors come from nova, neutron, glance and keystone; it appears that all default to 3600. So it appears that, even
> with wait_timeout > connection_recycle_time we still see mysql timeout errors.
> 
> Just for fun I tried setting the MySQL wait_timeout to 86400 and restarting MySQL. I expected that this would pause
> the "Aborted connection" errors for 24 hours, but they started again after an hour. So it looks like my original
> assumption was incorrect. I thought nova was keeping connections open until the MySQL server timed them out, but now
> it appears that something else is happening.
> 
> Has anyone successfully stopped these MySQL error messages?

could this be related to the eventlet heartbeat issue we see for rabbitmq when running the api under mod_wsgi/uwsgi?

e.g. hav eyou confirmed that you wsgi serer is configure to use 1 thread and multiple processes for concurancy
multiple thread in one process might have issues.
> -----Original Message-----
> From: Ben Nemec <openstack at nemebean.com> 
> Sent: Monday, September 9, 2019 9:50 AM
> To: Chris Hoge <chris at openstack.org>; openstack-discuss at lists.openstack.org
> Subject: Re: [oslo][nova] Nova causes MySQL timeouts
> 
> 
> 
> On 9/9/19 11:38 AM, Chris Hoge wrote:
> > In my personal experience, running Nova on a four core machine without
> > limiting the number of database connections will easily exhaust the
> > available connections to MySQL/MariaDB. Keep in mind that the limit
> > applies to every instance of a service, so if Nova starts 'm' services
> > replicated for 'n' cores with 'd' possible connections you'll be up to
> > â??m x n x d' connections. It gets big fast.
> > 
> > The default setting of '0' (that is, unlimited) does not make for a good
> > first-run experience, IMO.
> 
> We don't default to 0. We default to 5: 
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo.db_stein_reference_opts.html-23database.max-5Fpool-5Fsize&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=W7apBhYbgfvGgB46HWLe-By9d_MYg6RB_eU3C2mARRY&s=p7bBYcuhnDR_J08MWFBj8XLiRUUV8JfruAIcl0zF234&e=
>  
> 
> > 
> > This issue comes up every few years or so, and the consensus previously
> > is that 200-2000 connections is recommended based on your needs. Your
> > database has to be configured to handle the load and looking at the
> > configuration value across all your services and setting them
> > consistently and appropriately is important.
> > 
> > 
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_pipermail_openstack-2Ddev_2015-2DApril_061808.html&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=W7apBhYbgfvGgB46HWLe-By9d_MYg6RB_eU3C2mARRY&s=FGLfZK5eHj7z_xL-5DJsPgHkOt_T131ugvicMvcMDbc&e=
> >  
> 
> Thanks, I did not recall that discussion.
> 
> If I'm reading it correctly, Jay is suggesting that for MySQL we should 
> just disable connection pooling. As I noted earlier, I don't think we 
> expose the ability to do that in oslo.db (patches welcome!), but setting 
> max_pool_size to 1 would get you pretty close. Maybe we should add that 
> to the help text for the option in oslo.db?
> 
> > 
> > > On Sep 6, 2019, at 7:34 AM, Ben Nemec <openstack at nemebean.com> wrote:
> > > 
> > > Tagging with oslo as this sounds related to oslo.db.
> > > 
> > > On 9/5/19 7:37 PM, Albert Braden wrote:
> > > > After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay
> > > > open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the
> > > > pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time
> > > > and allow 10 temp.
> > > > Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that
> > > > a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if
> > > > the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct?
> > > 
> > > I don't think so. According to [0] and [1], a pool_size of 0 means unlimited. You could probably set it to 1 to
> > > minimize the number of connections kept open, but then I expect you'll have overhead from having to re-open
> > > connections frequently.
> > > 
> > > It sounds like you could use a NullPool to eliminate connection pooling entirely, but I don't think we support
> > > that in oslo.db. Based on the error message you're seeing, I would take a look at connection_recycle_time[2]. I
> > > seem to recall seeing a comment that the recycle time needs to be shorter than any of the timeouts in the path
> > > between the service and the db (so anything like haproxy or mysql itself). Shortening that, or lengthening
> > > intervening timeouts, might get rid of these disconnection messages.
> > > 
> > > 0: 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo.db_stein_reference_opts.html-23database.max-5Fpool-5Fsize&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=W7apBhYbgfvGgB46HWLe-By9d_MYg6RB_eU3C2mARRY&s=p7bBYcuhnDR_J08MWFBj8XLiRUUV8JfruAIcl0zF234&e=
> > >  
> > > 1: 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.sqlalchemy.org_en_13_core_pooling.html-23sqlalchemy.pool.QueuePool.-5F-5Finit-5F-5F&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=W7apBhYbgfvGgB46HWLe-By9d_MYg6RB_eU3C2mARRY&s=_EIhQyyj1gSM0PrX7de3yJr8hNi7tD8-tnfPo2VV_LU&e=
> > >  
> > > 2: 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo.db_stein_reference_opts.html-23database.connection-5Frecycle-5Ftime&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=W7apBhYbgfvGgB46HWLe-By9d_MYg6RB_eU3C2mARRY&s=xDnj80EQrxXwenOLgmKEaJbF3VRIylapDgqyMs81pSY&e=
> > >  
> > > 
> > > > *From:* Albert Braden <Albert.Braden at synopsys.com>
> > > > *Sent:* Wednesday, September 4, 2019 10:19 AM
> > > > *To:* openstack-discuss at lists.openstack.org
> > > > *Cc:* Gaëtan Trellu <gaetan.trellu at incloudus.com>
> > > > *Subject:* RE: Nova causes MySQL timeouts
> > > > Weâ??re not setting max_pool_size nor max_overflow option presently. I googled around and found this document:
> > > > 
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config-2Doptions.html&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=W7apBhYbgfvGgB46HWLe-By9d_MYg6RB_eU3C2mARRY&s=NXcUpNTYGd6ZP-1oOUaQXsF7rHQ0mAt4e9uL8zzd0KA&e=
> > > > =  <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config-
> > > > 2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7
> > > > II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e=>
> > > > Document says:
> > > > [api_database]
> > > > connection_recycle_time = 3600               (Integer) Timeout before idle SQL connections are reaped.
> > > > max_overflow = None                                   (Integer) If set, use this value for max_overflow with
> > > > SQLAlchemy.
> > > > max_pool_size = None                                  (Integer) Maximum number of SQL connections to keep open
> > > > in a pool.
> > > > [database]
> > > > connection_recycle_time = 3600               (Integer) Timeout before idle SQL connections are reaped.
> > > > min_pool_size = 1                                            (Integer) Minimum number of SQL connections to keep
> > > > open in a pool.
> > > > max_overflow = 50                                          (Integer) If set, use this value for max_overflow
> > > > with SQLAlchemy.
> > > > max_pool_size = None                                  (Integer) Maximum number of SQL connections to keep open
> > > > in a pool.
> > > > If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the
> > > > recommended values for these, to allow unused connections to close before they time out? Is â??min_pool_size = 0â??
> > > > an acceptable setting?
> > > > My settings are default:
> > > > [api_database]:
> > > > #connection_recycle_time = 3600
> > > > #max_overflow = <None>
> > > > #max_pool_size = <None>
> > > > [database]:
> > > > #connection_recycle_time = 3600
> > > > #min_pool_size = 1
> > > > #max_overflow = 50
> > > > #max_pool_size = 5
> > > > Itâ??s not obvious what max_overflow does. Where can I find a document that explains more about these settings?
> > > > *From:* Gaëtan Trellu <gaetan.trellu at incloudus.com <mailto:gaetan.trellu at incloudus.com>>
> > > > *Sent:* Tuesday, September 3, 2019 1:37 PM
> > > > *To:* Albert Braden <albertb at synopsys.com <mailto:albertb at synopsys.com>>
> > > > *Cc:* openstack-discuss at lists.openstack.org <mailto:openstack-discuss at lists.openstack.org>
> > > > *Subject:* Re: Nova causes MySQL timeouts
> > > > Hi Albert,
> > > > It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section.
> > > > Keep in mind than more workers you will have more connections will be opened on the database.
> > > > Gaetan (goldyfruit)
> > > > On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden at synopsys.com <mailto:Albert.Braden at synopsys.com>> wrote:
> > > >     It looks like nova is keeping mysql connections open until they time
> > > >     out. How are others responding to this issue? Do you just ignore the
> > > >     mysql errors, or is it possible to change configuration so that nova
> > > >     closes and reopens connections before they time out? Or is there a
> > > >     way to stop mysql from logging these aborted connections without
> > > >     hiding real issues?
> > > >     Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf'
> > > >     (Got timeout reading communication packets)
> > 
> > 
> 
>