[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[oslo][nova] Nova causes MySQL timeouts

I thought I had figured out that the solution was to increase the MySQL wait_timeout so that it is longer than the nova (and glance, neutron, etc.) connection_recycle_time (3600). I increased my MySQL wait_timeout to 6000:

root at us01odc-qa-ctrl1:~# mysqladmin variables|grep wait_timeout|grep -v _wait
| wait_timeout                                           | 6000         

But I still see the MySQL errors. There's no LB; we are pointing to a single MySQL host. 

Sep 11 14:59:56 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:56 8016 [Warning] Aborted connection 8016 to db: 'nova' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets)
Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8019 [Warning] Aborted connection 8019 to db: 'glance' user: 'glance' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets)
Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8018 [Warning] Aborted connection 8018 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets)
Sep 11 15:00:50 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 15:00:50 8022 [Warning] Aborted connection 8022 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets)

The errors come from nova, neutron, glance and keystone; it appears that all default to 3600. So it appears that, even with wait_timeout > connection_recycle_time we still see mysql timeout errors.

Just for fun I tried setting the MySQL wait_timeout to 86400 and restarting MySQL. I expected that this would pause the "Aborted connection" errors for 24 hours, but they started again after an hour. So it looks like my original assumption was incorrect. I thought nova was keeping connections open until the MySQL server timed them out, but now it appears that something else is happening.

Has anyone successfully stopped these MySQL error messages?

-----Original Message-----
From: Ben Nemec <openstack at nemebean.com> 
Sent: Monday, September 9, 2019 9:50 AM
To: Chris Hoge <chris at openstack.org>; openstack-discuss at lists.openstack.org
Subject: Re: [oslo][nova] Nova causes MySQL timeouts

On 9/9/19 11:38 AM, Chris Hoge wrote:
> In my personal experience, running Nova on a four core machine without
> limiting the number of database connections will easily exhaust the
> available connections to MySQL/MariaDB. Keep in mind that the limit
> applies to every instance of a service, so if Nova starts 'm' services
> replicated for 'n' cores with 'd' possible connections you'll be up to
> â??m x n x d' connections. It gets big fast.
> The default setting of '0' (that is, unlimited) does not make for a good
> first-run experience, IMO.

We don't default to 0. We default to 5: 

> This issue comes up every few years or so, and the consensus previously
> is that 200-2000 connections is recommended based on your needs. Your
> database has to be configured to handle the load and looking at the
> configuration value across all your services and setting them
> consistently and appropriately is important.
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_pipermail_openstack-2Ddev_2015-2DApril_061808.html&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=W7apBhYbgfvGgB46HWLe-By9d_MYg6RB_eU3C2mARRY&s=FGLfZK5eHj7z_xL-5DJsPgHkOt_T131ugvicMvcMDbc&e= 

Thanks, I did not recall that discussion.

If I'm reading it correctly, Jay is suggesting that for MySQL we should 
just disable connection pooling. As I noted earlier, I don't think we 
expose the ability to do that in oslo.db (patches welcome!), but setting 
max_pool_size to 1 would get you pretty close. Maybe we should add that 
to the help text for the option in oslo.db?

>> On Sep 6, 2019, at 7:34 AM, Ben Nemec <openstack at nemebean.com> wrote:
>> Tagging with oslo as this sounds related to oslo.db.
>> On 9/5/19 7:37 PM, Albert Braden wrote:
>>> After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp.
>>> Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct?
>> I don't think so. According to [0] and [1], a pool_size of 0 means unlimited. You could probably set it to 1 to minimize the number of connections kept open, but then I expect you'll have overhead from having to re-open connections frequently.
>> It sounds like you could use a NullPool to eliminate connection pooling entirely, but I don't think we support that in oslo.db. Based on the error message you're seeing, I would take a look at connection_recycle_time[2]. I seem to recall seeing a comment that the recycle time needs to be shorter than any of the timeouts in the path between the service and the db (so anything like haproxy or mysql itself). Shortening that, or lengthening intervening timeouts, might get rid of these disconnection messages.
>> 0: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo.db_stein_reference_opts.html-23database.max-5Fpool-5Fsize&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=W7apBhYbgfvGgB46HWLe-By9d_MYg6RB_eU3C2mARRY&s=p7bBYcuhnDR_J08MWFBj8XLiRUUV8JfruAIcl0zF234&e= 
>> 1: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.sqlalchemy.org_en_13_core_pooling.html-23sqlalchemy.pool.QueuePool.-5F-5Finit-5F-5F&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=W7apBhYbgfvGgB46HWLe-By9d_MYg6RB_eU3C2mARRY&s=_EIhQyyj1gSM0PrX7de3yJr8hNi7tD8-tnfPo2VV_LU&e= 
>> 2: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo.db_stein_reference_opts.html-23database.connection-5Frecycle-5Ftime&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=W7apBhYbgfvGgB46HWLe-By9d_MYg6RB_eU3C2mARRY&s=xDnj80EQrxXwenOLgmKEaJbF3VRIylapDgqyMs81pSY&e= 
>>> *From:* Albert Braden <Albert.Braden at synopsys.com>
>>> *Sent:* Wednesday, September 4, 2019 10:19 AM
>>> *To:* openstack-discuss at lists.openstack.org
>>> *Cc:* Gaëtan Trellu <gaetan.trellu at incloudus.com>
>>> *Subject:* RE: Nova causes MySQL timeouts
>>> Weâ??re not setting max_pool_size nor max_overflow option presently. I googled around and found this document:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config-2Doptions.html&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=W7apBhYbgfvGgB46HWLe-By9d_MYg6RB_eU3C2mARRY&s=NXcUpNTYGd6ZP-1oOUaQXsF7rHQ0mAt4e9uL8zzd0KA&e=  <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config-2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e=>
>>> Document says:
>>> [api_database]
>>> connection_recycle_time = 3600               (Integer) Timeout before idle SQL connections are reaped.
>>> max_overflow = None                                   (Integer) If set, use this value for max_overflow with SQLAlchemy.
>>> max_pool_size = None                                  (Integer) Maximum number of SQL connections to keep open in a pool.
>>> [database]
>>> connection_recycle_time = 3600               (Integer) Timeout before idle SQL connections are reaped.
>>> min_pool_size = 1                                            (Integer) Minimum number of SQL connections to keep open in a pool.
>>> max_overflow = 50                                          (Integer) If set, use this value for max_overflow with SQLAlchemy.
>>> max_pool_size = None                                  (Integer) Maximum number of SQL connections to keep open in a pool.
>>> If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is â??min_pool_size = 0â?? an acceptable setting?
>>> My settings are default:
>>> [api_database]:
>>> #connection_recycle_time = 3600
>>> #max_overflow = <None>
>>> #max_pool_size = <None>
>>> [database]:
>>> #connection_recycle_time = 3600
>>> #min_pool_size = 1
>>> #max_overflow = 50
>>> #max_pool_size = 5
>>> Itâ??s not obvious what max_overflow does. Where can I find a document that explains more about these settings?
>>> *From:* Gaëtan Trellu <gaetan.trellu at incloudus.com <mailto:gaetan.trellu at incloudus.com>>
>>> *Sent:* Tuesday, September 3, 2019 1:37 PM
>>> *To:* Albert Braden <albertb at synopsys.com <mailto:albertb at synopsys.com>>
>>> *Cc:* openstack-discuss at lists.openstack.org <mailto:openstack-discuss at lists.openstack.org>
>>> *Subject:* Re: Nova causes MySQL timeouts
>>> Hi Albert,
>>> It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section.
>>> Keep in mind than more workers you will have more connections will be opened on the database.
>>> Gaetan (goldyfruit)
>>> On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden at synopsys.com <mailto:Albert.Braden at synopsys.com>> wrote:
>>>     It looks like nova is keeping mysql connections open until they time
>>>     out. How are others responding to this issue? Do you just ignore the
>>>     mysql errors, or is it possible to change configuration so that nova
>>>     closes and reopens connections before they time out? Or is there a
>>>     way to stop mysql from logging these aborted connections without
>>>     hiding real issues?
>>>     Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf'
>>>     (Got timeout reading communication packets)