Nate:
Yes that helps a bunch! I verified that
pbs_server was in fact connected to port 15000 on cree. I restarted using
pbs_mom –S 15000 on huron and everything works fine now.
Thanks again. Now I can do some real work
J
Cheers,
Tim
From:
nathaniel.x.woody@xxxxxxx [mailto:nathaniel.x.woody@xxxxxxx]
Sent: Tuesday, July 10, 2007 2:20
PM
To: Carbo, Timothy J.
Cc: torqueusers@xxxxxxxxxxxxxxxx
Subject: RE: [torqueusers] No
contact with server at hostaddr problem (followup)
Tim,
I have
managed to mis-configure pbs to give me these symptoms in two ways:
1)
pbs_server isn't running on the port that the mom thinks it is. Make sure
that the pbs_server is running on 15001 (looks like you're already looking at
this). As mentioned not long ago, you can start the mom with pbs_mom -S
15000 to force mom to look for the server at port 15000. (though I
suppose a wise thing to do may be to see what port pbs_server is running on
first).
2)
Fudged up ethernet names for the server on the mom (personally, I've done this
with multi-homed servers). Does the mom-node (cree?) know who
huron(server) is (and vice versa)? Is that an entry in /etc/hosts on the
mom-node? Being a wimp, I almost always use the ip of the server in the
config file for the pbsserver entry to avoid making this mistake.
I
suppose the third option is that pbs_server is actually running at all on
huron.
Hope
that helps,
Nate
|
"Carbo, Timothy J."
<TIMOTHY.J.CARBO@xxxxxxxx>
Sent
by: torqueusers-bounces@xxxxxxxxxxxxxxxx
10-Jul-2007 15:16
|
|
To
|
"Garrick Staples"
<garrick@xxxxxxx>, torqueusers@xxxxxxxxxxxxxxxx
|
|
cc
|
|
|
Subject
|
RE: [torqueusers] No contact with server at
hostaddr problem (followup)
|
|
Garrick:
Sorry I wasn't clear
My set up is
Node1 (cree): running pbs_server, pbs_mom
and maui
server_priv/nodes:
cree np=8
Huron np=8
mom_priv/config:
$pbsserver cree
Node2 (huron): running pbs_mom only
mom_priv/config:
$pbsserver cree
When I submit the following on cree
echo "sleep 30" | qsub
the job appears to be scheduled on huron and runs
OK but then I start
seeing the "No contact with server at
hostaddr port 15001" error
messages repeated in the mom_logs file on huron
and it appears that the
pbs_server never is notified that the job ran to
completion.
Hope this clears things up a little.
Regards,
Tim
-----Original Message-----
From: torqueusers-bounces@xxxxxxxxxxxxxxxx
[mailto:torqueusers-bounces@xxxxxxxxxxxxxxxx] On
Behalf Of Garrick
Staples
Sent: Tuesday, July 10, 2007 12:28 PM
To: torqueusers@xxxxxxxxxxxxxxxx
Subject: Re: [torqueusers] No contact with server
at hostaddr problem
(followup)
On Mon, Jul 09, 2007 at 09:30:09AM -0600, Carbo,
Timothy J. alleged:
> Hello all.
>
>
>
> I was tracking the following email chain and
was wondering if there is
> any resolution to the problem below. I
just installed TORQUE 2.1.8
with
> Maui
3.2.6-p19 on a two node system (both x86-64 bit Xeon quad core
> systems running Red Hat AS 4 update 4) and am
having the same exact
> problem when I try to submit a job on my
client node (jobs run fine on
> the server node). Oddly, the remote
node is trying to connect to port
> 15001 on the server node but netstat -a
indicates there is nothing
> listening at that port. I am pretty new
to Torque so am I missing
> something?
It is a little hard to figure out your setup here
with "client",
"server", and "remote" nodes.
If both hosts are to handle compute jobs, then you
want pbs_mom running
on both hosts and both hostnames in
server_priv/nodes.
--
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California
Please avoid sending me Word or PowerPoint
attachments.
See
http://www.gnu.org/philosophy/no-word-attachments.html
_______________________________________________
torqueusers mailing list
torqueusers@xxxxxxxxxxxxxxxx
http://www.supercluster.org/mailman/listinfo/torqueusers