[kolla][nova][cinder] Got Gateway-Timeout error on VM evacuation if it has volume attached.
On 7/25/2019 3:14 AM, Gorka Eguileor wrote:
> Attachment delete is a synchronous operation, so all the different
> connection timeouts may affect the operation: Nova to HAProxy, HAProxy
> to Cinder-API, Cinder-API to Cinder-Volume via RabbitMQ, Cinder-Volume
> to Storage backend.
> I would recommend you looking at the specific attachment_delete request
> that failed in Cinder logs and see how long it took to complete, and
> then check how long it took for the 504 error to happen. With that info
> you can get an idea of how much higher your timeout must be.
> It could also happen that the Cinder-API raises a timeout error when
> calling the Cinder-Volume. In this case you should check the
> cinder-volume service to see how long it took it to complete, as the
> operation continues.
> Internally the Cinder-API to Cinder-Volume timeout is usually around 60
> seconds (rpc_response_timeout).
Yeah this is a known intermittent issue in our CI jobs as well, for example:
As I mentioned in the bug report for that issue:
It might be worth using the long_rpc_timeout approach for this assuming
the http response doesn't timeout. Nova uses long_rpc_timeout for known
long RPC calls:
Cinder should probably do the same for initialize connection style RPC
calls. I've seen other gate failures where cinder-backup to
cinder-volume rpc calls to initialize a connection have timed out as