logo       
Google Custom Search
    AddThis Social Bookmark Button
-->

RE: No contact with server at hostaddr problem (followup): msg#00042

Subject: RE: No contact with server at hostaddr problem (followup)

Nate:

 

Yes that helps a bunch!   I verified that pbs_server was in fact connected to port 15000 on cree.  I restarted using pbs_mom –S 15000 on huron and everything works fine now.

 

Thanks again.  Now I can do some real work J

 

Cheers,

Tim

 

 

 


From: nathaniel.x.woody@xxxxxxx [mailto:nathaniel.x.woody@xxxxxxx]
Sent: Tuesday, July 10, 2007 2:20 PM
To: Carbo, Timothy J.
Cc: torqueusers@xxxxxxxxxxxxxxxx
Subject: RE: [torqueusers] No contact with server at hostaddr problem (followup)

 


Tim,

I have managed to mis-configure pbs to give me these symptoms in two ways:

1) pbs_server isn't running on the port that the mom thinks it is.  Make sure that the pbs_server is running on 15001 (looks like you're already looking at this).  As mentioned not long ago, you can start the mom with pbs_mom -S 15000 to force mom to look for the server at port 15000.  (though I suppose a wise thing to do may be to see what port pbs_server is running on first).  

2) Fudged up ethernet names for the server on the mom (personally, I've done this with multi-homed servers).  Does the mom-node (cree?) know who huron(server) is (and vice versa)?  Is that an entry in /etc/hosts on the mom-node?  Being a wimp, I almost always use the ip of the server in the config file for the pbsserver entry to avoid making this mistake.

I suppose the third option is that pbs_server is actually running at all on huron.  

Hope that helps,
Nate



"Carbo, Timothy J." <TIMOTHY.J.CARBO@xxxxxxxx>
Sent by: torqueusers-bounces@xxxxxxxxxxxxxxxx

10-Jul-2007 15:16

       

To

"Garrick Staples" <garrick@xxxxxxx>, torqueusers@xxxxxxxxxxxxxxxx

cc

 

Subject

RE: [torqueusers] No contact with server at hostaddr problem (followup)

 

 

 




Garrick:

Sorry I wasn't clear

My set up is

Node1 (cree):  running pbs_server, pbs_mom and maui

server_priv/nodes:
cree np=8
Huron np=8

mom_priv/config:
$pbsserver cree

Node2 (huron):  running pbs_mom only

mom_priv/config:
$pbsserver cree

When I submit the following on cree

echo "sleep 30" | qsub

the job appears to be scheduled on huron and runs OK but then I start
seeing the "No contact with server at hostaddr port 15001" error
messages repeated in the mom_logs file on huron and it appears that the
pbs_server never is notified that the job ran to completion.

Hope this clears things up a little.

Regards,
Tim


-----Original Message-----
From: torqueusers-bounces@xxxxxxxxxxxxxxxx
[mailto:torqueusers-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Garrick
Staples
Sent: Tuesday, July 10, 2007 12:28 PM
To: torqueusers@xxxxxxxxxxxxxxxx
Subject: Re: [torqueusers] No contact with server at hostaddr problem
(followup)

On Mon, Jul 09, 2007 at 09:30:09AM -0600, Carbo, Timothy J. alleged:
> Hello all.
>
>  
>
> I was tracking the following email chain and was wondering if there is
> any resolution to the problem below.  I just installed TORQUE 2.1.8
with
> Maui 3.2.6-p19 on a two node system (both x86-64 bit Xeon quad core
> systems running Red Hat AS 4 update 4) and am having the same exact
> problem when I try to submit a job on my client node (jobs run fine on
> the server node).  Oddly, the remote node is trying to connect to port
> 15001 on the server node but netstat -a indicates there is nothing
> listening at that port.  I am pretty new to Torque so am I missing
> something?

It is a little hard to figure out your setup here with "client",
"server", and "remote" nodes.

If both hosts are to handle compute jobs, then you want pbs_mom running
on both hosts and both hostnames in server_priv/nodes.

--
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
_______________________________________________
torqueusers mailing list
torqueusers@xxxxxxxxxxxxxxxx
http://www.supercluster.org/mailman/listinfo/torqueusers

_______________________________________________
torqueusers mailing list
torqueusers@xxxxxxxxxxxxxxxx
http://www.supercluster.org/mailman/listinfo/torqueusers
<Prev in Thread] Current Thread [Next in Thread>