|
|
Subject: Re: Re: Suspended jobs resume execution - msg#00021
Hi,
I'm running the most recent maui snapshot. Still, the preemption is driving
me nuts:
#> showq
348031 user1 Running 1 93:21:58:55 Wed Aug 2 17:57:59
348037 user1 Running 1 93:21:58:55 Wed Aug 2 17:57:59
348043 user1 Running 1 93:21:58:55 Wed Aug 2 17:57:59
348067 user1 Running 1 93:21:58:55 Wed Aug 2 17:57:59
348026 user1 Running 1 93:22:00:12 Wed Aug 2 17:59:16
348032 user1 Running 1 93:22:03:54 Wed Aug 2 18:02:58
348079 user1 Running 1 99:23:48:43 Tue Aug 8 19:47:47
348085 user1 Running 1 99:23:57:12 Tue Aug 8 19:56:16
348091 user1 Running 1 99:23:57:19 Tue Aug 8 19:56:23
348044 user1 Running 1 99:23:58:02 Tue Aug 8 19:57:06
348050 user1 Running 1 99:23:58:10 Tue Aug 8 19:57:14
348056 user1 Running 1 99:23:59:59 Tue Aug 8 19:59:03
348025 user1 Suspended 1 93:21:58:55 Wed Aug 2 17:57:59
348038 user1 Suspended 1 93:22:03:54 Wed Aug 2 18:02:58
348049 user1 Suspended 1 93:21:58:55 Wed Aug 2 17:57:59
348055 user1 Suspended 1 93:21:58:55 Wed Aug 2 17:57:59
348061 user1 Suspended 1 93:21:58:55 Wed Aug 2 17:57:59
348073 user1 Suspended 1 93:21:58:55 Wed Aug 2 17:57:59
As you can see - previously suspended jobs are NOT resumed immediately;
instead OTHER jobs from the SAME queue are started :-(
Is there ANY possibility to manually fix that now?
Cheers,
Ronny
Thread at a glance:
Previous Message by Date:
Re: Re: [Mauiusers] maui memory consumption with MMAX_JOBredefined
On Tue, Aug 08, 2006 at 10:39:59AM -0700, Sam Rash alleged:
> Ooh, I may have missed something: we regularly hit maui with 5k jobs
> daily--the default for MMAX_JOB is 4096. What does this actually mean?
> Only 4096 will be considered by maui at a time? (ie, left in the RM)
Correct. Any jobs after the max are simply ignored.
When you think about it, since 4096 jobs can't actually run (since you
don't actually have that many nodes), there isn't much need for maui to
read in more jobs.
When I came across this problem on my own cluster, I found that the "bad
user" would always pass any max jobs that I built into maui. A strategy
to deal with this is to use routing queues in TORQUE...
set server default_queue = default
create queue default queue_type=R,route_destinations=mainexec
create queue mainexec queue_type=E,max_queuable=1000
I have a fairly deeply nested set of routing queues for different groups
of users, each with different max resources, acls, max_queuables, and
max_user_runs. The idea is to prevent a user in one group to swamp maui
and prevent other queues from executing.
Next Message by Date:
Re: [torqueusers] Re: maui memory consumption with MMAX_JOBredefined
>>Ooh, I may have missed something: we regularly hit maui with 5k jobs
>>daily--the default for MMAX_JOB is 4096. What does this actually mean?
>>Only 4096 will be considered by maui at a time? (ie, left in the RM)
> Correct. Any jobs after the max are simply ignored.
Yeah, but that's my problem. If the queue is flooded (which is ok in my
case) and someone submits a job into the short queue, won't this one be
ignored if only 4K jobs are considered?
Or is the 4K limit _per queue_ (following your description it's server-wide) ?
Cheers,
Ronny
Previous Message by Thread:
maui memory consumption with MMAX_JOB redefined
Hi,
my site is often seeing job bursts. Usually there are a couple of 100 jobs
queued. Sometimes there are close to 100.000 queued. So I adjusted the
MMAX_JOB define in include/msched.h to (128*1024).
Is it expected that maui now eats RAM like it's cookies?
root 25206 0.0 7.4 170560 154856 pts/0 - 14:28 0:00 ./maui-128k
The above is with 24 jobs active. It the job array statically allocated?
Cheers,
Ronny
Next Message by Thread:
Re: Re: Suspended jobs resume execution
Robin,
Some of our deadlines have past and I was able to take time today to look at
this suspension problem
in more detail. I have found that the solution is not just a simple fix in the
code, but a
combination of settings and changes.
The first issue I investigated was why the suspended job's run-priority was not
growing over time;
in other words, why the job was not "aging." In order to ensure the job's
run-priority would grow,
even in a suspended state, I implemented a new job priority weight factor called
USAGEEXECUTIONTIMEWEIGHT. This, like other USAGE sub-component factors, is only
applied to active
jobs and only works if the USAGEWEIGHT is set to something other than 0. A
positive
EXECUTIONTIMEWEIGHT will cause jobs that have a start time (including suspended
jobs, as they were
once started), to increase in run priority over time. With these settings the
job should properly age.
In my testing, I also found that an internal Maui attribute named the "suspension min time" could
sometimes get in the way of resuming the suspended job. This attribute's purpose is to prevent Maui
from suspending and resuming and then suspending the same job within the same iteration. (It
prevents rapid "flipping" of jobs.) A job will not resume after being suspended until after this min
time has passed. Maui starts counting immediately after the job is suspended/resumed. This attribute
was set to 60 seconds and if the PREEMPTOR job finished before this time, then the suspended job
would not resume because the min time had not yet been satisfied. Even with a growing priority this
min time could prevent jobs from being resumed. In order to help alleviate the chances of this
happening often, I decreased the "suspension min time" to 10 seconds.
The last way that this issue can exhibit itself is when an advanced reservation is blocking the
suspended job's ability to resume. This happens only if the PREEMPTOR job's wallclock limit is
less-than or equal-to the suspended job's wallclock limit.
For example, if we have two jobs in the queue with the same priority, A_low and B_low, and B_low was
submitted second, then let's say A_low starts and takes up the nodes needed by B_low. So B_low is
now in the Idle queue, but creates a reservation in the future so it can guarantee to run after
A_low is complete. Next a PREEMPTOR job, C_high, comes in with a higher priority and suspends A_low
so that C_high can run. The advanced reservation that B_low has will now be adjusted to fit "around"
the new wallclock limit of C_high. If C_high runs shorter than A_low does, then B_low's advanced
reservation will move backward in time. If C_high ends, and A_low tries to resume it won't be able
too, because B_low's advanced reservation will be overlapping A_low's run-length. If, however,
C_high was longer than A_low's wallclock, then A_low can still squeeze in before B_low's reservation
begins.
Perhaps the example was a little much, but I hope you get the idea. In Maui there is currently only
one way to get around this: controlling the creation of advanced reservations. Depending on the
needs of your cluster, you can disable advanced reservations altogether by using:
RESERVATIONPOLICY NEVER
in your maui.cfg. If this suspension problem really hurts the utilization of your cluster, than this
solution may work best for your site. Otherwise, it may be a little overkill.
In Moab Workload Manager you can enable lower priority reservations to be "preempted" as well,
allowing for A_low to run no matter where B_low's reservation begins. Adding this feature to Maui
would, unfortunately, be quite the extensive effort and I don't foresee us being able to implement
it anytime soon.
All of the above changes have been included in the most recent development snapshot available at
http://www.clusterresources.com/downloads/maui/.
Let me know if you experience any problems or have any more questions. We appreciate the continuing
support from the Maui community and their active participation in resolving bugs and creating
enhancements.
--
Joshua Butikofer
Cluster Resources, Inc.
josh@xxxxxxxxxxxxxxxxxxxx
Voice: (801) 717-3707
Fax: (801) 717-3738
--------------------------
Robin Humble wrote:
Hi,
On Thu, Apr 27, 2006 at 10:32:40AM -0600, Josh Butikofer wrote:
We've confirmed that this behavior is happening in Maui. Moab Workload
Manager currently has the desired behavior with suspended jobs accruing
priority (and also correctly handles different classes involved). We
hope that over the next few weeks we will be able to make these
improvements in Maui as well. We will keep the list posted on our progress.
any updates?
in case you were looking for a simpler test case, the below 2 queue
system seems to have the same behaviour as the previous bug report -
ie. the suspended PREEMPTEE job has a hard time resuming.
in other words after a PREEMPTOR job steams through (correctly) we end
up with a previously queued PREEMPTEE job then being chosen to run over
the top of the suspended PREEMPTEE job.
I don't think this is correct behaviour as only PREEMPTOR jobs should
be able to run over the top of PREEMPTEE jobs.
versions are:
torque 2.1.1-3 (rebuild on AS4 i686 from the fc5 .src.rpm), maui 3.2.6p16
relevant part of maui.cfg:
PREEMPTPOLICY SUSPEND
CLASSCFG[debug] QDEF=high
CLASSCFG[workq] QDEF=low
QOSCFG[high] PRIORITY=500 QFLAGS=PREEMPTOR
QOSCFG[low] PRIORITY=100 QFLAGS=PREEMPTEE
QOSWEIGHT 1
cheers,
robin
--
Joshua Butikofer
Cluster Resources, Inc.
josh@xxxxxxxxxxxxxxxxxxxx
(801) 798-7488
--------------------------
David Corredor wrote:
The problem is not just that the suspended job gets once again preempted
by a job of its same class from the IDLE queue, this happens regardless
of the class of the new job.
Ex. 3 queues (1 verylong, 1 long, 1 fast. Fast preempts long and
verylong, and long preempts verylong, verylong should not preempt).
- Submit 1 long job so that it takes all resources in cluster.
- Submit a verylong job so that it waits in the IDLE queue.
- Submit a fast job.
The fast job preempts the long one, and once it finishes, instead of the
long one to resume execution, the verylong kicks in and preempts it once
again (and it shouldn't).
<quote who="Ronny T. Lampert">
.....
However I experience the very same problem as you do (I need the
QUEUETIMEWEIGHT set to 1) - the preempted ones stay suspended and instead
a
NEW job from the batch queue is started :-(
I think this is a bug: suspended jobs *should age*, too.
Or automatically get a slightly higher priority than the highest in the
same
class to prevent it from staying suspended and interrupted by jobs from
the
same class.
Could some developer shortly comment on that issue?
Thanks!
Ronny
_______________________________________________
mauiusers mailing list
mauiusers@xxxxxxxxxxxxxxxx
http://www.supercluster.org/mailman/listinfo/mauiusers
_______________________________________________
mauiusers mailing list
mauiusers@xxxxxxxxxxxxxxxx
http://www.supercluster.org/mailman/listinfo/mauiusers
_______________________________________________
mauiusers mailing list
mauiusers@xxxxxxxxxxxxxxxx
http://www.supercluster.org/mailman/listinfo/mauiusers
|
|