osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] CloudStack graceful shutdown


Part of this sounds like the Windows shut down process which is familiar to many.

For those who have never used Windows:

Once you initiate the shutdown, it asks the tasks to shut down.
If tasks have not shutdown within a "reasonable period", it lists them and asks you if you want to wait a bit longer, force them to close or abort the shutdown so that you can manually shut them down. If you "force" a shutdown it closes all of the tasks using all of the brutality at its command. If you abort, then you have to redo the shutdown after you have manually exited from the processes that you care about.

This is pretty user friendly but requires that you have a way to signal to a task that it is time to say goodbye.

The "reasonable time" needs to have a default that is short enough to make the operator happy and long enough to have a reasonable chance of getting everything stopped without intervention. If you allow the shutdown to proceed after the interval, while the operator waits then you need to refresh the list of running tasks when tasks end.

Ron

On 17/04/2018 11:27 AM, Rafael Weingärtner wrote:
Ilya and others,

We have been discussing this idea of graceful/nicely shutdown.  Our feeling
is that we (in CloudStack community) might have been trying to solve this
problem with too much scripting. What if we developed a more integrated
(native) solution?

Let me explain our idea.

ACS has a table called “mshost”, which is used to store management server
information. During balancing and when jobs are dispatched to other
management servers this table is consulted/queried.  Therefore, we have
been discussing the idea of creating a management API for management
servers.  We could have an API method that changes the state of management
servers to “prepare to maintenance” and then “maintenance” (as soon as all
of the task/jobs it is managing finish). The idea is that during
rebalancing we would remove the hosts of servers that are not in “Up” state
(of course we would also ignore hosts in the aforementioned state to
receive hosts to manage).  Moreover, when we send/dispatch jobs to other
management servers, we could ignore the ones that are not in “Up” state
(which is something already done).

By doing this, the nicely shutdown could be executed in a few steps.

1 – issue the maintenance method for the management server you desire
2 – wait until the MS goes into maintenance mode, while there are still
running jobs it (the management server) will be maintained in prepare for
maintenance
3 – execute the Linux shutdown command

We would need other APIs methods to manage MSs then. An (i) API method to
list MSs, and we could even create an (ii) API to remove old/de-activated
management servers, which we currently do not have (forcing users to apply
changed directly in the database).

Moreover, in this model, we would not kill hanging jobs; we would wait
until they expire and ACS expunges them. Of course, it is possible to
develop a forceful maintenance method as well. Then, when the “prepare for
maintenance” takes longer than a parameter, we could kill hanging jobs.

All of this would allow the MS to be kept up and receiving requests until
it can be safely shutdown. What do you guys about this approach?

On Tue, Apr 10, 2018 at 6:52 PM, Yiping Zhang <yzhang@xxxxxxxxxxx> wrote:

As a cloud admin, I would love to have this feature.

It so happens that I just accidentally restarted my ACS management server
while two instances are migrating to another Xen cluster (via storage
migration, not live migration).  As results, both instances
ends up with corrupted data disk which can't be reattached or migrated.

Any feature which prevents this from happening would be great.  A low
hanging fruit is simply checking for
if there are any async jobs running, especially any kind of migration jobs
or other known long running type of
jobs and warn the operator  so that he has a chance to abort server
shutdowns.

Yiping

On 4/5/18, 3:13 PM, "ilya musayev" <ilya.mailing.lists@xxxxxxxxx> wrote:

     Andrija

     This is a tough scenario.

     As an admin, they way i would have handled this situation, is to
advertise
     the upcoming outage and then take away specific API commands from a
user a
     day before - so he does not cause any long running async jobs. Once
     maintenance completes - enable the API commands back to the user.
However -
     i dont know who your user base is and if this would be an acceptable
     solution.

     Perhaps also investigate what can be done to speed up your long running
     tasks...

     As a side node, we will be working on a feature that would allow for a
     graceful termination of the process/job, meaning if agent noticed a
     disconnect or termination request - it will abort the command in
flight. We
     can also consider restarting this tasks again or what not - but it
would
     not be part of this enhancement.

     Regards
     ilya

     On Thu, Apr 5, 2018 at 6:47 AM, Andrija Panic <andrija.panic@xxxxxxxxx
     wrote:

     > Hi Ilya,
     >
     > thanks for the feedback - but in "real world", you need to
"understand"
     > that 60min is next to useless timeout for some jobs (if I understand
this
     > specific parameter correctly ?? - job is really canceled, not only
job
     > monitoring is canceled ???) -
     >
     > My value for the  "job.cancel.threshold.minutes" is 2880 minutes (2
days?)
     >
     > I can tell you when you have CEPH/NFS (CEPH even "worse" case, since
slower
     > read durign qemu-img convert process...) of 500GB, then imagine
snapshot
     > job will take many hours. Should I mention 1TB volumes (yes, we had
     > client's like that...)
     > Than attaching 1TB volume, that was uploaded to ACS (lives
originally on
     > Secondary Storage, and takes time to be copied over to NFS/CEPH)
will take
     > up to few hours.
     > Then migrating 1TB volume from NFS to CEPH, or CEPH to NFS, also
takes
     > time...etc.
     >
     > I'm just giving you feedback as "user", admin of the cloud, zero DEV
skills
     > here :) , just to make sure you make practical decisions (and I
admit I
     > might be wrong with my stuff, but just giving you feedback from our
public
     > cloud setup)
     >
     >
     > Cheers!
     >
     >
     >
     >
     > On 5 April 2018 at 15:16, Tutkowski, Mike <Mike.Tutkowski@xxxxxxxxxx
     > wrote:
     >
     > > Wow, there’s been a lot of good details noted from several people
on how
     > > this process works today and how we’d like it to work in the near
future.
     > >
     > > 1) Any chance this is already documented on the Wiki?
     > >
     > > 2) If not, any chance someone would be willing to do so (a flow
diagram
     > > would be particularly useful).
     > >
     > > > On Apr 5, 2018, at 3:37 AM, Marc-Aurèle Brothier <
marco@xxxxxxxxxxx>
     > > wrote:
     > > >
     > > > Hi all,
     > > >
     > > > Good point ilya but as stated by Sergey there's more thing to
consider
     > > > before being able to do a proper shutdown. I augmented my script
I gave
     > > you
     > > > originally and changed code in CS. What we're doing for our
environment
     > > is
     > > > as follow:
     > > >
     > > > 1. the MGMT looks for a change in the file /etc/lb-agent which
contains
     > > > keywords for HAproxy[2] (ready, maint) so that HA-proxy can
disable the
     > > > mgmt on the keyword "maint" and the mgmt server stops a couple of
     > > > threads[1] to stop processing async jobs in the queue
     > > > 2. Looks for the async jobs and wait until there is none to
ensure you
     > > can
     > > > send the reconnect commands (if jobs are running, a reconnect
will
     > result
     > > > in a failed job since the result will never reach the management
     > server -
     > > > the agent waits for the current job to be done before
reconnecting, and
     > > > discard the result... rooms for improvement here!)
     > > > 3. Issue a reconnectHost command to all the hosts connected to
the mgmt
     > > > server so that they reconnect to another one, otherwise the mgmt
must
     > be
     > > up
     > > > since it is used to forward commands to agents.
     > > > 4. when all agents are reconnected, we can shutdown the
management
     > server
     > > > and perform the maintenance.
     > > >
     > > > One issue remains for me, during the reconnect, the commands
that are
     > > > processed at the same time should be kept in a queue until the
agents
     > > have
     > > > finished any current jobs and have reconnected. Today the little
time
     > > > window during which the reconnect happens can lead to failed
jobs due
     > to
     > > > the agent not being connected at the right moment.
     > > >
     > > > I could push a PR for the change to stop some processing threads
based
     > on
     > > > the content of a file. It's possible also to cancel the drain of
the
     > > > management by simply changing the content of the file back to
"ready"
     > > > again, instead of "maint" [2].
     > > >
     > > > [1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector
     > > > [2] HA proxy documentation on agent checker:
https://cbonte.github.io/
     > > > haproxy-dconv/1.6/configuration.html#5.2-agent-check
     > > >
     > > > Regarding your issue on the port blocking, I think it's fair to
     > consider
     > > > that if you want to shutdown your server at some point, you have
to
     > stop
     > > > serving (some) requests. Here the only way it's to stop serving
     > > everything.
     > > > If the API had a REST design, we could reject any POST/PUT/DELETE
     > > > operations and allow GET ones. I don't know how hard it would be
today
     > to
     > > > only allow listBaseCmd operations to be more friendly with the
users.
     > > >
     > > > Marco
     > > >
     > > >
     > > > On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <
serg38l@xxxxxxxxxxx>
     > > > wrote:
     > > >
     > > >> Now without spellchecking :)
     > > >>
     > > >> This is not simple e.g. for VMware. Each management server also
acts
     > as
     > > an
     > > >> agent proxy so tasks against a particular ESX host will be
always
     > > >> forwarded. That right answer will be to support a native
“maintenance
     > > mode”
     > > >> for management server. When entered to such mode the management
server
     > > >> should release all agents including SSVM, block/redirect API
calls and
     > > >> login request and finish all async job it originated.
     > > >>
     > > >>
     > > >>
     > > >> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <
serg38l@xxxxxxxxxxx
     > > <mailto:
     > > >> serg38l@xxxxxxxxxxx>> wrote:
     > > >>
     > > >> This is not simple e.g. for VMware. Each management server also
acts
     > as
     > > an
     > > >> agent proxy so tasks against a particular ESX host will be
always
     > > >> forwarded. That right answer will be to a native support for
     > > “maintenance
     > > >> mode” for management server. When entered to such mode the
management
     > > >> server should release all agents including save, block/redirect
API
     > > calls
     > > >> and login request and finish all a sync job it originated.
     > > >>
     > > >> Sent from my iPhone
     > > >>
     > > >> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner <
     > > >> rafaelweingartner@xxxxxxxxx<mailto:rafaelweingartner@xxxxxxxxx
     > wrote:
     > > >>
     > > >> Ilya, still regarding the management server that is being shut
down
     > > issue;
     > > >> if other MSs/or maybe system VMs (I am not sure to know if they
are
     > > able to
     > > >> do such tasks) can direct/redirect/send new jobs to this
management
     > > server
     > > >> (the one being shut down), the process might never end because
new
     > tasks
     > > >> are always being created for the management server that we want
to
     > shut
     > > >> down. Is this scenario possible?
     > > >>
     > > >> That is why I mentioned blocking the port 8250 for the
     > > “graceful-shutdown”.
     > > >>
     > > >> If this scenario is not possible, then everything s fine.
     > > >>
     > > >>
     > > >> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <
     > > ilya.mailing.lists@xxxxxxxxx
     > > >> <mailto:ilya.mailing.lists@xxxxxxxxx>>
     > > >> wrote:
     > > >>
     > > >> I'm thinking of using a configuration from
     > > "job.cancel.threshold.minutes" -
     > > >> it will be the longest
     > > >>
     > > >>    "category": "Advanced",
     > > >>
     > > >>    "description": "Time (in minutes) for async-jobs to be
forcely
     > > >> cancelled if it has been in process for long",
     > > >>
     > > >>    "name": "job.cancel.threshold.minutes",
     > > >>
     > > >>    "value": "60"
     > > >>
     > > >>
     > > >>
     > > >>
     > > >> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner <
     > > >> rafaelweingartner@xxxxxxxxx<mailto:rafaelweingartner@xxxxxxxxx
     > wrote:
     > > >>
     > > >> Big +1 for this feature; I only have a few doubts.
     > > >>
     > > >> * Regarding the tasks/jobs that management servers (MSs)
execute; are
     > > >> these
     > > >> tasks originate from requests that come to the MS, or is it
possible
     > > that
     > > >> requests received by one management server to be executed by
other? I
     > > >> mean,
     > > >> if I execute a request against MS1, will this request always be
     > > >> executed/threated by MS1, or is it possible that this request is
     > > executed
     > > >> by another MS (e.g. MS2)?
     > > >>
     > > >> * I would suggest that after we block traffic coming from
     > > >> 8080/8443/8250(we
     > > >> will need to block this as well right?), we can log the
execution of
     > > >> tasks.
     > > >> I mean, something saying, there are XXX tasks (enumerate tasks)
still
     > > >> being
     > > >> executed, we will wait for them to finish before shutting down.
     > > >>
     > > >> * The timeout (60 minutes suggested) could be global settings
that we
     > > can
     > > >> load before executing the graceful-shutdown.
     > > >>
     > > >> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
     > > >> ilya.mailing.lists@xxxxxxxxx<mailto:ilya.mailing.lists@
gmail.com>
     > > >>
     > > >> wrote:
     > > >>
     > > >> Use case:
     > > >> In any environment - time to time - administrator needs to
perform a
     > > >> maintenance. Current stop sequence of cloudstack management
server
     > will
     > > >> ignore the fact that there may be long running async jobs - and
     > > >> terminate
     > > >> the process. This in turn can create a poor user experience and
     > > >> occasional
     > > >> inconsistency  in cloudstack db.
     > > >>
     > > >> This is especially painful in large environments where the user
has
     > > >> thousands of nodes and there is a continuous patching that
happens
     > > >> around
     > > >> the clock - that requires migration of workload from one node to
     > > >> another.
     > > >>
     > > >> With that said - i've created a script that monitors the async
job
     > > >> queue
     > > >> for given MS and waits for it complete all jobs. More details
are
     > > >> posted
     > > >> below.
     > > >>
     > > >> I'd like to introduce "graceful-shutdown" into the
systemctl/service
     > of
     > > >> cloudstack-management service.
     > > >>
     > > >> The details of how it will work is below:
     > > >>
     > > >> Workflow for graceful shutdown:
     > > >> Using iptables/firewalld - block any connection attempts on
8080/8443
     > > >> (we
     > > >> can identify the ports dynamically)
     > > >> Identify the MSID for the node, using the proper msid - query
     > > >> async_job
     > > >> table for
     > > >> 1) any jobs that are still running (or job_status=“0”)
     > > >> 2) job_dispatcher not like “pseudoJobDispatcher"
     > > >> 3) job_init_msid=$my_ms_id
     > > >>
     > > >> Monitor this async_job table for 60 minutes - until all async
jobs for
     > > >> MSID
     > > >> are done, then proceed with shutdown
     > > >>  If failed for any reason or terminated, catch the exit via trap
     > > >> command
     > > >> and unblock the 8080/8443
     > > >>
     > > >> Comments are welcome
     > > >>
     > > >> Regards,
     > > >> ilya
     > > >>
     > > >>
     > > >>
     > > >>
     > > >> --
     > > >> Rafael Weingärtner
     > > >>
     > > >>
     > > >>
     > > >>
     > > >>
     > > >> --
     > > >> Rafael Weingärtner
     > > >>
     > >
     >
     >
     >
     > --
     >
     > Andrija Panić
     >





--
Ron Wheeler
President
Artifact Software Inc
email: rwheeler@xxxxxxxxxxxxxxxxxxxxx
skype: ronaldmwheeler
phone: 866-970-2435, ext 102