osdir.com
mailing list archive F.A.Q. -since 2001!



Subject: Re: Re: Suspended jobs resume execution -
msg#00021

List: clustering.maui.user

Mail Archive Navigation:
by Date: Prev Next Date Index by Thread: Prev Next Thread Index

Hi,

I'm running the most recent maui snapshot. Still, the preemption is driving
me nuts:

#> showq


348031 user1 Running 1 93:21:58:55 Wed Aug 2 17:57:59
348037 user1 Running 1 93:21:58:55 Wed Aug 2 17:57:59
348043 user1 Running 1 93:21:58:55 Wed Aug 2 17:57:59
348067 user1 Running 1 93:21:58:55 Wed Aug 2 17:57:59
348026 user1 Running 1 93:22:00:12 Wed Aug 2 17:59:16
348032 user1 Running 1 93:22:03:54 Wed Aug 2 18:02:58
348079 user1 Running 1 99:23:48:43 Tue Aug 8 19:47:47
348085 user1 Running 1 99:23:57:12 Tue Aug 8 19:56:16
348091 user1 Running 1 99:23:57:19 Tue Aug 8 19:56:23
348044 user1 Running 1 99:23:58:02 Tue Aug 8 19:57:06
348050 user1 Running 1 99:23:58:10 Tue Aug 8 19:57:14
348056 user1 Running 1 99:23:59:59 Tue Aug 8 19:59:03
348025 user1 Suspended 1 93:21:58:55 Wed Aug 2 17:57:59
348038 user1 Suspended 1 93:22:03:54 Wed Aug 2 18:02:58
348049 user1 Suspended 1 93:21:58:55 Wed Aug 2 17:57:59
348055 user1 Suspended 1 93:21:58:55 Wed Aug 2 17:57:59
348061 user1 Suspended 1 93:21:58:55 Wed Aug 2 17:57:59
348073 user1 Suspended 1 93:21:58:55 Wed Aug 2 17:57:59


As you can see - previously suspended jobs are NOT resumed immediately;
instead OTHER jobs from the SAME queue are started :-(

Is there ANY possibility to manually fix that now?

Cheers,
Ronny




Thread at a glance:

Previous Message by Date:

Re: Re: [Mauiusers] maui memory consumption with MMAX_JOBredefined

On Tue, Aug 08, 2006 at 10:39:59AM -0700, Sam Rash alleged: > Ooh, I may have missed something: we regularly hit maui with 5k jobs > daily--the default for MMAX_JOB is 4096. What does this actually mean? > Only 4096 will be considered by maui at a time? (ie, left in the RM) Correct. Any jobs after the max are simply ignored. When you think about it, since 4096 jobs can't actually run (since you don't actually have that many nodes), there isn't much need for maui to read in more jobs. When I came across this problem on my own cluster, I found that the "bad user" would always pass any max jobs that I built into maui. A strategy to deal with this is to use routing queues in TORQUE... set server default_queue = default create queue default queue_type=R,route_destinations=mainexec create queue mainexec queue_type=E,max_queuable=1000 I have a fairly deeply nested set of routing queues for different groups of users, each with different max resources, acls, max_queuables, and max_user_runs. The idea is to prevent a user in one group to swamp maui and prevent other queues from executing.

Next Message by Date:

Re: [torqueusers] Re: maui memory consumption with MMAX_JOBredefined

>>Ooh, I may have missed something: we regularly hit maui with 5k jobs >>daily--the default for MMAX_JOB is 4096. What does this actually mean? >>Only 4096 will be considered by maui at a time? (ie, left in the RM) > Correct. Any jobs after the max are simply ignored. Yeah, but that's my problem. If the queue is flooded (which is ok in my case) and someone submits a job into the short queue, won't this one be ignored if only 4K jobs are considered? Or is the 4K limit _per queue_ (following your description it's server-wide) ? Cheers, Ronny

Previous Message by Thread:

maui memory consumption with MMAX_JOB redefined

Hi, my site is often seeing job bursts. Usually there are a couple of 100 jobs queued. Sometimes there are close to 100.000 queued. So I adjusted the MMAX_JOB define in include/msched.h to (128*1024). Is it expected that maui now eats RAM like it's cookies? root 25206 0.0 7.4 170560 154856 pts/0 - 14:28 0:00 ./maui-128k The above is with 24 jobs active. It the job array statically allocated? Cheers, Ronny

Next Message by Thread:

Re: Re: Suspended jobs resume execution

Robin, Some of our deadlines have past and I was able to take time today to look at this suspension problem in more detail. I have found that the solution is not just a simple fix in the code, but a combination of settings and changes. The first issue I investigated was why the suspended job's run-priority was not growing over time; in other words, why the job was not "aging." In order to ensure the job's run-priority would grow, even in a suspended state, I implemented a new job priority weight factor called USAGEEXECUTIONTIMEWEIGHT. This, like other USAGE sub-component factors, is only applied to active jobs and only works if the USAGEWEIGHT is set to something other than 0. A positive EXECUTIONTIMEWEIGHT will cause jobs that have a start time (including suspended jobs, as they were once started), to increase in run priority over time. With these settings the job should properly age. In my testing, I also found that an internal Maui attribute named the "suspension min time" could sometimes get in the way of resuming the suspended job. This attribute's purpose is to prevent Maui from suspending and resuming and then suspending the same job within the same iteration. (It prevents rapid "flipping" of jobs.) A job will not resume after being suspended until after this min time has passed. Maui starts counting immediately after the job is suspended/resumed. This attribute was set to 60 seconds and if the PREEMPTOR job finished before this time, then the suspended job would not resume because the min time had not yet been satisfied. Even with a growing priority this min time could prevent jobs from being resumed. In order to help alleviate the chances of this happening often, I decreased the "suspension min time" to 10 seconds. The last way that this issue can exhibit itself is when an advanced reservation is blocking the suspended job's ability to resume. This happens only if the PREEMPTOR job's wallclock limit is less-than or equal-to the suspended job's wallclock limit. For example, if we have two jobs in the queue with the same priority, A_low and B_low, and B_low was submitted second, then let's say A_low starts and takes up the nodes needed by B_low. So B_low is now in the Idle queue, but creates a reservation in the future so it can guarantee to run after A_low is complete. Next a PREEMPTOR job, C_high, comes in with a higher priority and suspends A_low so that C_high can run. The advanced reservation that B_low has will now be adjusted to fit "around" the new wallclock limit of C_high. If C_high runs shorter than A_low does, then B_low's advanced reservation will move backward in time. If C_high ends, and A_low tries to resume it won't be able too, because B_low's advanced reservation will be overlapping A_low's run-length. If, however, C_high was longer than A_low's wallclock, then A_low can still squeeze in before B_low's reservation begins. Perhaps the example was a little much, but I hope you get the idea. In Maui there is currently only one way to get around this: controlling the creation of advanced reservations. Depending on the needs of your cluster, you can disable advanced reservations altogether by using: RESERVATIONPOLICY NEVER in your maui.cfg. If this suspension problem really hurts the utilization of your cluster, than this solution may work best for your site. Otherwise, it may be a little overkill. In Moab Workload Manager you can enable lower priority reservations to be "preempted" as well, allowing for A_low to run no matter where B_low's reservation begins. Adding this feature to Maui would, unfortunately, be quite the extensive effort and I don't foresee us being able to implement it anytime soon. All of the above changes have been included in the most recent development snapshot available at http://www.clusterresources.com/downloads/maui/. Let me know if you experience any problems or have any more questions. We appreciate the continuing support from the Maui community and their active participation in resolving bugs and creating enhancements. -- Joshua Butikofer Cluster Resources, Inc. josh@xxxxxxxxxxxxxxxxxxxx Voice: (801) 717-3707 Fax: (801) 717-3738 -------------------------- Robin Humble wrote: Hi, On Thu, Apr 27, 2006 at 10:32:40AM -0600, Josh Butikofer wrote: We've confirmed that this behavior is happening in Maui. Moab Workload Manager currently has the desired behavior with suspended jobs accruing priority (and also correctly handles different classes involved). We hope that over the next few weeks we will be able to make these improvements in Maui as well. We will keep the list posted on our progress. any updates? in case you were looking for a simpler test case, the below 2 queue system seems to have the same behaviour as the previous bug report - ie. the suspended PREEMPTEE job has a hard time resuming. in other words after a PREEMPTOR job steams through (correctly) we end up with a previously queued PREEMPTEE job then being chosen to run over the top of the suspended PREEMPTEE job. I don't think this is correct behaviour as only PREEMPTOR jobs should be able to run over the top of PREEMPTEE jobs. versions are: torque 2.1.1-3 (rebuild on AS4 i686 from the fc5 .src.rpm), maui 3.2.6p16 relevant part of maui.cfg: PREEMPTPOLICY SUSPEND CLASSCFG[debug] QDEF=high CLASSCFG[workq] QDEF=low QOSCFG[high] PRIORITY=500 QFLAGS=PREEMPTOR QOSCFG[low] PRIORITY=100 QFLAGS=PREEMPTEE QOSWEIGHT 1 cheers, robin -- Joshua Butikofer Cluster Resources, Inc. josh@xxxxxxxxxxxxxxxxxxxx (801) 798-7488 -------------------------- David Corredor wrote: The problem is not just that the suspended job gets once again preempted by a job of its same class from the IDLE queue, this happens regardless of the class of the new job. Ex. 3 queues (1 verylong, 1 long, 1 fast. Fast preempts long and verylong, and long preempts verylong, verylong should not preempt). - Submit 1 long job so that it takes all resources in cluster. - Submit a verylong job so that it waits in the IDLE queue. - Submit a fast job. The fast job preempts the long one, and once it finishes, instead of the long one to resume execution, the verylong kicks in and preempts it once again (and it shouldn't). <quote who="Ronny T. Lampert"> ..... However I experience the very same problem as you do (I need the QUEUETIMEWEIGHT set to 1) - the preempted ones stay suspended and instead a NEW job from the batch queue is started :-( I think this is a bug: suspended jobs *should age*, too. Or automatically get a slightly higher priority than the highest in the same class to prevent it from staying suspended and interrupted by jobs from the same class. Could some developer shortly comment on that issue? Thanks! Ronny _______________________________________________ mauiusers mailing list mauiusers@xxxxxxxxxxxxxxxx http://www.supercluster.org/mailman/listinfo/mauiusers _______________________________________________ mauiusers mailing list mauiusers@xxxxxxxxxxxxxxxx http://www.supercluster.org/mailman/listinfo/mauiusers _______________________________________________ mauiusers mailing list mauiusers@xxxxxxxxxxxxxxxx http://www.supercluster.org/mailman/listinfo/mauiusers
blog comments powered by Disqus

Home | News | Sitemap | FAQ | advertise | OSDir is an Inevitable website. GBiz is too!