OSDir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] CloudStack graceful shutdown


Hi Sergey

Glad to see you are doing well,

I was gonna say drop "enterprise virtualization company" and save a
$fortune$ - but its not for everyone :)

I'll post another proposed solution to bottom of this thread.

Regards
ilya


On Wed, Apr 4, 2018 at 5:22 PM, Sergey Levitskiy <serg38l@xxxxxxxxxxx>
wrote:

> Now without spellchecking :)
>
> This is not simple e.g. for VMware. Each management server also acts as an
> agent proxy so tasks against a particular ESX host will be always
> forwarded. That right answer will be to support a native “maintenance mode”
> for management server. When entered to such mode the management server
> should release all agents including SSVM, block/redirect API calls and
> login request and finish all async job it originated.
>
>
>
> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <serg38l@xxxxxxxxxxx<mailto:
> serg38l@xxxxxxxxxxx>> wrote:
>
> This is not simple e.g. for VMware. Each management server also acts as an
> agent proxy so tasks against a particular ESX host will be always
> forwarded. That right answer will be to a native support for “maintenance
> mode” for management server. When entered to such mode the management
> server should release all agents including save, block/redirect API calls
> and login request and finish all a sync job it originated.
>
> Sent from my iPhone
>
> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
> rafaelweingartner@xxxxxxxxx<mailto:rafaelweingartner@xxxxxxxxx>> wrote:
>
> Ilya, still regarding the management server that is being shut down issue;
> if other MSs/or maybe system VMs (I am not sure to know if they are able to
> do such tasks) can direct/redirect/send new jobs to this management server
> (the one being shut down), the process might never end because new tasks
> are always being created for the management server that we want to shut
> down. Is this scenario possible?
>
> That is why I mentioned blocking the port 8250 for the “graceful-shutdown”.
>
> If this scenario is not possible, then everything s fine.
>
>
> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <ilya.mailing.lists@xxxxxxxxx
> <mailto:ilya.mailing.lists@xxxxxxxxx>>
> wrote:
>
> I'm thinking of using a configuration from "job.cancel.threshold.minutes" -
> it will be the longest
>
>     "category": "Advanced",
>
>     "description": "Time (in minutes) for async-jobs to be forcely
> cancelled if it has been in process for long",
>
>     "name": "job.cancel.threshold.minutes",
>
>     "value": "60"
>
>
>
>
> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
> rafaelweingartner@xxxxxxxxx<mailto:rafaelweingartner@xxxxxxxxx>> wrote:
>
> Big +1 for this feature; I only have a few doubts.
>
> * Regarding the tasks/jobs that management servers (MSs) execute; are
> these
> tasks originate from requests that come to the MS, or is it possible that
> requests received by one management server to be executed by other? I
> mean,
> if I execute a request against MS1, will this request always be
> executed/threated by MS1, or is it possible that this request is executed
> by another MS (e.g. MS2)?
>
> * I would suggest that after we block traffic coming from
> 8080/8443/8250(we
> will need to block this as well right?), we can log the execution of
> tasks.
> I mean, something saying, there are XXX tasks (enumerate tasks) still
> being
> executed, we will wait for them to finish before shutting down.
>
> * The timeout (60 minutes suggested) could be global settings that we can
> load before executing the graceful-shutdown.
>
> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
> ilya.mailing.lists@xxxxxxxxx<mailto:ilya.mailing.lists@xxxxxxxxx>
>
> wrote:
>
> Use case:
> In any environment - time to time - administrator needs to perform a
> maintenance. Current stop sequence of cloudstack management server will
> ignore the fact that there may be long running async jobs - and
> terminate
> the process. This in turn can create a poor user experience and
> occasional
> inconsistency  in cloudstack db.
>
> This is especially painful in large environments where the user has
> thousands of nodes and there is a continuous patching that happens
> around
> the clock - that requires migration of workload from one node to
> another.
>
> With that said - i've created a script that monitors the async job
> queue
> for given MS and waits for it complete all jobs. More details are
> posted
> below.
>
> I'd like to introduce "graceful-shutdown" into the systemctl/service of
> cloudstack-management service.
>
> The details of how it will work is below:
>
> Workflow for graceful shutdown:
> Using iptables/firewalld - block any connection attempts on 8080/8443
> (we
> can identify the ports dynamically)
> Identify the MSID for the node, using the proper msid - query
> async_job
> table for
> 1) any jobs that are still running (or job_status=“0”)
> 2) job_dispatcher not like “pseudoJobDispatcher"
> 3) job_init_msid=$my_ms_id
>
> Monitor this async_job table for 60 minutes - until all async jobs for
> MSID
> are done, then proceed with shutdown
>   If failed for any reason or terminated, catch the exit via trap
> command
> and unblock the 8080/8443
>
> Comments are welcome
>
> Regards,
> ilya
>
>
>
>
> --
> Rafael Weingärtner
>
>
>
>
>
> --
> Rafael Weingärtner
>