Please take our Survey
logo       

Choosing A Webhost:
A web hosting service is a type of Internet hosting service that allows individuals and organizations to provide their own website accessible via the World Wide Web. Web hosts are companies that provide space on a server they own for use by their clients as well as providing Internet connectivity, typically in a data center. Web hosts can also provide data center space and connectivity to the Internet for servers they do not own to be located in their data center, called colocation. more...

Re: pbs_server and nodes file how to handle comments: msg#00149

clustering.torque.user

Subject: Re: pbs_server and nodes file how to handle comments

On Wed, Mar 29, 2006 at 01:24:06PM +0100, David Golden alleged:
> On 2006-03-28 09:58:09 -0800, Garrick Staples wrote:
> > On Tue, Mar 28, 2006 at 01:46:44PM +0100, David Golden alleged:
> > > > That would be a frequency of 0. New nodes start in state unknown, get
> > > > pinged, and get an addr list. The old nodes never get the new addr
> > > > list.
> > >
> > > Ah.
> > >
> > > Not that it's necessarily what you'd want to do (especially given your
> > > large-cluster avoiding-ping concerns and maybe iffy effect on running
> > > jobs, though jobs on nodes I tested on weren't interrupted):
> > > but if you "pbsnodes -r" on the old nodes to force them state=down,
> > > do they then get the updated node list and do something useful with
> > > it when they're noticed to be "back" online by the server?
> >
>
> > Yes, setting a node to down will trigger a ping operation and it will
> > get a new addr list.
> >
> > This is why a cluster-wide ping operation is needed to support creating
> > new nodes automatically.
> >
>
> Well, point being that presumably one could therefore do a
> "pbsnodes -r node1 node2 node3 node4 ... nodeN" after adding
> the new nodes -i.e. bring every node in the cluster to
> state = down so they all get the new list? (you could do subsets
> at a time, too, for large clusters (especially if said cluster
> is split into nodesets, nodes in one set mightn't need to know
> about the nodes in another set immediately)) - maybe
> sledgehammer-for-a-nail, though then again maybe not: e.g. you
> mightn't want new parallel jobs issued to a node until you're
> sure it had the new node list.

I'm trying to come with the "simple and works in all cases" solution. I
don't think messing with the individual node states really satisfies
that.


> There's also the not-being-able-to-do-everything-within-qmgr:
> but you could make node states settable within qmgr,
> then do much the same thing - i.e.

You can, 'set node XXXX state += down', but that is clunky,
error-prone and doesn't actually prevent the race condition.


> create new nodes
> set all nodes down
> clear new nodes offline

My proposal basicly does #2, but simpler:

create new nodes
'set server ping_nodes=T'
clear new nodes

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California

Attachment: pgpVcZvxFp9Qm.pgp
Description: PGP signature

_______________________________________________
torqueusers mailing list
torqueusers@xxxxxxxxxxxxxxxx
http://www.supercluster.org/mailman/listinfo/torqueusers
<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
hardware.arm.at...    cms.citadel.dev...    video.gstreamer...    java.facelets.u...    misc.basics.qna...    web.wiki.instik...    network.uip.use...    xdg.devel/2003-...    tex.bibtex.bibd...    finance.quotesp...    ietf.zeroconf/2...    redhat.blinux.g...    suse.db2/2003-0...    php.phpesp/2004...    uml.devel/2003-...    gnome.labyrinth...    qnx.openqnx.dev...    boot-loaders.gr...    db.dataperfect....    audio.audacity....    linux.uclinux.m...    editors.j.devel...    os.openbsd.tech...    kde.users.multi...   
Home | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe

Navigation