|
|
Choosing A Webhost: |
Re: pbs_server and nodes file how to handle comments: msg#00149clustering.torque.user
On Wed, Mar 29, 2006 at 01:24:06PM +0100, David Golden alleged: > On 2006-03-28 09:58:09 -0800, Garrick Staples wrote: > > On Tue, Mar 28, 2006 at 01:46:44PM +0100, David Golden alleged: > > > > That would be a frequency of 0. New nodes start in state unknown, get > > > > pinged, and get an addr list. The old nodes never get the new addr > > > > list. > > > > > > Ah. > > > > > > Not that it's necessarily what you'd want to do (especially given your > > > large-cluster avoiding-ping concerns and maybe iffy effect on running > > > jobs, though jobs on nodes I tested on weren't interrupted): > > > but if you "pbsnodes -r" on the old nodes to force them state=down, > > > do they then get the updated node list and do something useful with > > > it when they're noticed to be "back" online by the server? > > > > > Yes, setting a node to down will trigger a ping operation and it will > > get a new addr list. > > > > This is why a cluster-wide ping operation is needed to support creating > > new nodes automatically. > > > > Well, point being that presumably one could therefore do a > "pbsnodes -r node1 node2 node3 node4 ... nodeN" after adding > the new nodes -i.e. bring every node in the cluster to > state = down so they all get the new list? (you could do subsets > at a time, too, for large clusters (especially if said cluster > is split into nodesets, nodes in one set mightn't need to know > about the nodes in another set immediately)) - maybe > sledgehammer-for-a-nail, though then again maybe not: e.g. you > mightn't want new parallel jobs issued to a node until you're > sure it had the new node list. I'm trying to come with the "simple and works in all cases" solution. I don't think messing with the individual node states really satisfies that. > There's also the not-being-able-to-do-everything-within-qmgr: > but you could make node states settable within qmgr, > then do much the same thing - i.e. You can, 'set node XXXX state += down', but that is clunky, error-prone and doesn't actually prevent the race condition. > create new nodes > set all nodes down > clear new nodes offline My proposal basicly does #2, but simpler: create new nodes 'set server ping_nodes=T' clear new nodes -- Garrick Staples, Linux/HPCC Administrator University of Southern California
torqueusers mailing list torqueusers@xxxxxxxxxxxxxxxx http://www.supercluster.org/mailman/listinfo/torqueusers
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: {Spam?} Session id on qstat, Garrick Staples |
|---|---|
| Next by Date: | Re: Exceeded job limits on nodes, Garrick Staples |
| Previous by Thread: | Re: pbs_server and nodes file how to handle comments, David Golden |
| Next by Thread: | Re: pbs_server and nodes file how to handle comments, Garrick Staples |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |