OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[DISCUSS][ASK] Should agent wait for pending tasks on (mgmt server) disconnection?


All,


Historically, when the agent (kvm, ssvm, cpvm) is disconnected from the management server (say due to mgmt server restart etc), the reconnection logic waits for any pending tasks/commands to complete before reconnection attempts are made. I tried to search git history but could not find a reason, can anyone share why we may need this?


Based on the reported issue:

https://github.com/apache/cloudstack/issues/2633


I've a working patch which removes this limitation:

https://github.com/apache/cloudstack/pull/2638


>From testing with various combinations of tasks, I found that when that happens even if the pending task succeeds it fails to send an Answer to the mgmt server, therefore from the control plane's perspective that task is still pending/on-going.


When the mgmt server comes back online, and the agent finally reconnects (pending on how long the pending task took) the executed operation is still pending in mgmt server's view and may sometimes require manual cleanups in database. By removing the limitation in above PR, at least the agent reconnects faster while of the failure/fault behaviours remain the same. A bigger design fix would be to make management server asynchronous of agent side answer/response handling.


- Rohit

<https://cloudstack.apache.org>



rohit.yadav@xxxxxxxxxxxxx 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue