|
|
Choosing A Webhost: |
Re: More than one job per CPU: msg#00070clustering.torque.user
On Tue, Sep 11, 2007 at 04:16:54PM -0500, Jeremy Mann alleged: > I've been searching the mail archive most of the day and I haven't found > anything regarding what our problem, well we call it a problem, is. > > We have a program that we run on our cluster a few hundred iterations at a > time. We nice the program 19 so it won't interfere with any other program. > So far, we've been doing this manually. Now we want to incorporate it into > PBS/Maui. The problem we are coming into is even though we submit it with > -l nice=19, PBS still says that compute node is state=busy and all other > jobs stay in the queue. We run the program niced 19 because it usually > runs for about 5-6 days on our 20 nodes, so we need the ability to run > other things during this time. > > What I've been trying to accomplish for a few days now is to somehow make > PBS submit a job to a compute node that has this niced 19 job running on > it. I've tried everything I can think of and what I've found in the > manpages. > > The changes I've tried are: > > In maui.cfg I've added: > NODEACCESSPOLICY SHARED > NODEALLOCATIONPOLICY MINRESOURCE > NODECFG[DEFAULT] PRIORITYF=JOBCOUNT > NODEMAXLOAD 4.00 > > USERCFG[tigre] QDEF=tigre > USERCFG[abarca] QDEF=gasbor > QOSCFG[gasbor] PRIORITY=-100 FLAGS=PREEMPTEE > QOSCFG[tigre] PRIORITY=100 FLAGS=PREEMPTOR:IGNMAXJOB > > My idea here was to create to QoS's, where the gasbor job (the niced 19 > job) would preempt in favor of the tigre jobs. This however has never > worked. > > I took one compute node offline and edited it mom_priv/config file and > added '$ideal_load 4.0'. My thinking here was if the telling PBS this node > will run at a 4.0 load, it will execute mode jobs on this node. Again, > this never worked either. If the node is "busy" in torque, then maui won't run a job on it. End of story. So you want to keep the node from being busy with the $ideal_load and $max_load options. You mentioned that you tried the former, but did you also set the later?
torqueusers mailing list torqueusers@xxxxxxxxxxxxxxxx http://www.supercluster.org/mailman/listinfo/torqueusers
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | *Cluster Resources Newsletter - August/September*, Nick Ihli |
|---|---|
| Next by Date: | ERROR: Number of meshes not equal to number of threads, Nilesh Mistry |
| Previous by Thread: | More than one job per CPU, Jeremy Mann |
| Next by Thread: | Re: More than one job per CPU, Jeremy Mann |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |