osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] CloudStack graceful shutdown


Andrija

This is a tough scenario.

As an admin, they way i would have handled this situation, is to advertise
the upcoming outage and then take away specific API commands from a user a
day before - so he does not cause any long running async jobs. Once
maintenance completes - enable the API commands back to the user. However -
i dont know who your user base is and if this would be an acceptable
solution.

Perhaps also investigate what can be done to speed up your long running
tasks...

As a side node, we will be working on a feature that would allow for a
graceful termination of the process/job, meaning if agent noticed a
disconnect or termination request - it will abort the command in flight. We
can also consider restarting this tasks again or what not - but it would
not be part of this enhancement.

Regards
ilya

On Thu, Apr 5, 2018 at 6:47 AM, Andrija Panic <andrija.panic@xxxxxxxxx>
wrote:

> Hi Ilya,
>
> thanks for the feedback - but in "real world", you need to "understand"
> that 60min is next to useless timeout for some jobs (if I understand this
> specific parameter correctly ?? - job is really canceled, not only job
> monitoring is canceled ???) -
>
> My value for the  "job.cancel.threshold.minutes" is 2880 minutes (2 days?)
>
> I can tell you when you have CEPH/NFS (CEPH even "worse" case, since slower
> read durign qemu-img convert process...) of 500GB, then imagine snapshot
> job will take many hours. Should I mention 1TB volumes (yes, we had
> client's like that...)
> Than attaching 1TB volume, that was uploaded to ACS (lives originally on
> Secondary Storage, and takes time to be copied over to NFS/CEPH) will take
> up to few hours.
> Then migrating 1TB volume from NFS to CEPH, or CEPH to NFS, also takes
> time...etc.
>
> I'm just giving you feedback as "user", admin of the cloud, zero DEV skills
> here :) , just to make sure you make practical decisions (and I admit I
> might be wrong with my stuff, but just giving you feedback from our public
> cloud setup)
>
>
> Cheers!
>
>
>
>
> On 5 April 2018 at 15:16, Tutkowski, Mike <Mike.Tutkowski@xxxxxxxxxx>
> wrote:
>
> > Wow, there’s been a lot of good details noted from several people on how
> > this process works today and how we’d like it to work in the near future.
> >
> > 1) Any chance this is already documented on the Wiki?
> >
> > 2) If not, any chance someone would be willing to do so (a flow diagram
> > would be particularly useful).
> >
> > > On Apr 5, 2018, at 3:37 AM, Marc-Aurèle Brothier <marco@xxxxxxxxxxx>
> > wrote:
> > >
> > > Hi all,
> > >
> > > Good point ilya but as stated by Sergey there's more thing to consider
> > > before being able to do a proper shutdown. I augmented my script I gave
> > you
> > > originally and changed code in CS. What we're doing for our environment
> > is
> > > as follow:
> > >
> > > 1. the MGMT looks for a change in the file /etc/lb-agent which contains
> > > keywords for HAproxy[2] (ready, maint) so that HA-proxy can disable the
> > > mgmt on the keyword "maint" and the mgmt server stops a couple of
> > > threads[1] to stop processing async jobs in the queue
> > > 2. Looks for the async jobs and wait until there is none to ensure you
> > can
> > > send the reconnect commands (if jobs are running, a reconnect will
> result
> > > in a failed job since the result will never reach the management
> server -
> > > the agent waits for the current job to be done before reconnecting, and
> > > discard the result... rooms for improvement here!)
> > > 3. Issue a reconnectHost command to all the hosts connected to the mgmt
> > > server so that they reconnect to another one, otherwise the mgmt must
> be
> > up
> > > since it is used to forward commands to agents.
> > > 4. when all agents are reconnected, we can shutdown the management
> server
> > > and perform the maintenance.
> > >
> > > One issue remains for me, during the reconnect, the commands that are
> > > processed at the same time should be kept in a queue until the agents
> > have
> > > finished any current jobs and have reconnected. Today the little time
> > > window during which the reconnect happens can lead to failed jobs due
> to
> > > the agent not being connected at the right moment.
> > >
> > > I could push a PR for the change to stop some processing threads based
> on
> > > the content of a file. It's possible also to cancel the drain of the
> > > management by simply changing the content of the file back to "ready"
> > > again, instead of "maint" [2].
> > >
> > > [1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector
> > > [2] HA proxy documentation on agent checker: https://cbonte.github.io/
> > > haproxy-dconv/1.6/configuration.html#5.2-agent-check
> > >
> > > Regarding your issue on the port blocking, I think it's fair to
> consider
> > > that if you want to shutdown your server at some point, you have to
> stop
> > > serving (some) requests. Here the only way it's to stop serving
> > everything.
> > > If the API had a REST design, we could reject any POST/PUT/DELETE
> > > operations and allow GET ones. I don't know how hard it would be today
> to
> > > only allow listBaseCmd operations to be more friendly with the users.
> > >
> > > Marco
> > >
> > >
> > > On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <serg38l@xxxxxxxxxxx>
> > > wrote:
> > >
> > >> Now without spellchecking :)
> > >>
> > >> This is not simple e.g. for VMware. Each management server also acts
> as
> > an
> > >> agent proxy so tasks against a particular ESX host will be always
> > >> forwarded. That right answer will be to support a native “maintenance
> > mode”
> > >> for management server. When entered to such mode the management server
> > >> should release all agents including SSVM, block/redirect API calls and
> > >> login request and finish all async job it originated.
> > >>
> > >>
> > >>
> > >> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <serg38l@xxxxxxxxxxx
> > <mailto:
> > >> serg38l@xxxxxxxxxxx>> wrote:
> > >>
> > >> This is not simple e.g. for VMware. Each management server also acts
> as
> > an
> > >> agent proxy so tasks against a particular ESX host will be always
> > >> forwarded. That right answer will be to a native support for
> > “maintenance
> > >> mode” for management server. When entered to such mode the management
> > >> server should release all agents including save, block/redirect API
> > calls
> > >> and login request and finish all a sync job it originated.
> > >>
> > >> Sent from my iPhone
> > >>
> > >> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
> > >> rafaelweingartner@xxxxxxxxx<mailto:rafaelweingartner@xxxxxxxxx>>
> wrote:
> > >>
> > >> Ilya, still regarding the management server that is being shut down
> > issue;
> > >> if other MSs/or maybe system VMs (I am not sure to know if they are
> > able to
> > >> do such tasks) can direct/redirect/send new jobs to this management
> > server
> > >> (the one being shut down), the process might never end because new
> tasks
> > >> are always being created for the management server that we want to
> shut
> > >> down. Is this scenario possible?
> > >>
> > >> That is why I mentioned blocking the port 8250 for the
> > “graceful-shutdown”.
> > >>
> > >> If this scenario is not possible, then everything s fine.
> > >>
> > >>
> > >> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <
> > ilya.mailing.lists@xxxxxxxxx
> > >> <mailto:ilya.mailing.lists@xxxxxxxxx>>
> > >> wrote:
> > >>
> > >> I'm thinking of using a configuration from
> > "job.cancel.threshold.minutes" -
> > >> it will be the longest
> > >>
> > >>    "category": "Advanced",
> > >>
> > >>    "description": "Time (in minutes) for async-jobs to be forcely
> > >> cancelled if it has been in process for long",
> > >>
> > >>    "name": "job.cancel.threshold.minutes",
> > >>
> > >>    "value": "60"
> > >>
> > >>
> > >>
> > >>
> > >> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
> > >> rafaelweingartner@xxxxxxxxx<mailto:rafaelweingartner@xxxxxxxxx>>
> wrote:
> > >>
> > >> Big +1 for this feature; I only have a few doubts.
> > >>
> > >> * Regarding the tasks/jobs that management servers (MSs) execute; are
> > >> these
> > >> tasks originate from requests that come to the MS, or is it possible
> > that
> > >> requests received by one management server to be executed by other? I
> > >> mean,
> > >> if I execute a request against MS1, will this request always be
> > >> executed/threated by MS1, or is it possible that this request is
> > executed
> > >> by another MS (e.g. MS2)?
> > >>
> > >> * I would suggest that after we block traffic coming from
> > >> 8080/8443/8250(we
> > >> will need to block this as well right?), we can log the execution of
> > >> tasks.
> > >> I mean, something saying, there are XXX tasks (enumerate tasks) still
> > >> being
> > >> executed, we will wait for them to finish before shutting down.
> > >>
> > >> * The timeout (60 minutes suggested) could be global settings that we
> > can
> > >> load before executing the graceful-shutdown.
> > >>
> > >> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
> > >> ilya.mailing.lists@xxxxxxxxx<mailto:ilya.mailing.lists@xxxxxxxxx>
> > >>
> > >> wrote:
> > >>
> > >> Use case:
> > >> In any environment - time to time - administrator needs to perform a
> > >> maintenance. Current stop sequence of cloudstack management server
> will
> > >> ignore the fact that there may be long running async jobs - and
> > >> terminate
> > >> the process. This in turn can create a poor user experience and
> > >> occasional
> > >> inconsistency  in cloudstack db.
> > >>
> > >> This is especially painful in large environments where the user has
> > >> thousands of nodes and there is a continuous patching that happens
> > >> around
> > >> the clock - that requires migration of workload from one node to
> > >> another.
> > >>
> > >> With that said - i've created a script that monitors the async job
> > >> queue
> > >> for given MS and waits for it complete all jobs. More details are
> > >> posted
> > >> below.
> > >>
> > >> I'd like to introduce "graceful-shutdown" into the systemctl/service
> of
> > >> cloudstack-management service.
> > >>
> > >> The details of how it will work is below:
> > >>
> > >> Workflow for graceful shutdown:
> > >> Using iptables/firewalld - block any connection attempts on 8080/8443
> > >> (we
> > >> can identify the ports dynamically)
> > >> Identify the MSID for the node, using the proper msid - query
> > >> async_job
> > >> table for
> > >> 1) any jobs that are still running (or job_status=“0”)
> > >> 2) job_dispatcher not like “pseudoJobDispatcher"
> > >> 3) job_init_msid=$my_ms_id
> > >>
> > >> Monitor this async_job table for 60 minutes - until all async jobs for
> > >> MSID
> > >> are done, then proceed with shutdown
> > >>  If failed for any reason or terminated, catch the exit via trap
> > >> command
> > >> and unblock the 8080/8443
> > >>
> > >> Comments are welcome
> > >>
> > >> Regards,
> > >> ilya
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Rafael Weingärtner
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Rafael Weingärtner
> > >>
> >
>
>
>
> --
>
> Andrija Panić
>