Please take our Survey
logo       

Choosing A Webhost:
A web hosting service is a type of Internet hosting service that allows individuals and organizations to provide their own website accessible via the World Wide Web. Web hosts are companies that provide space on a server they own for use by their clients as well as providing Internet connectivity, typically in a data center. Web hosts can also provide data center space and connectivity to the Internet for servers they do not own to be located in their data center, called colocation. more...

Post job file processing error: msg#00191

clustering.torque.user

Subject: Post job file processing error

Dear All,

Im having problems in finishing the configuration of the Torque Batch System.
Im using the folowing software packages:

  • torque-scheduler-2.1.6
  • torque-client-2.1.6
  • torque-scheduler-2.1.6

The queues have been created without any problems and the server can reach by network all the clients. I have checked this last point submiting a simple shell script echo 'date' for 10 times from the server and I can see in the client 10 shell session opened for runing the job.

Job submission script:

# queue selected for that job
#PBS -q long
cat $PBS_NODEFILE
#PBS -o /home5/userxxx/pbs.output
#PBS -l nodes=1
#PBS -I
#PBS -r n
#PBS -l walltime=12:00:00
#PBS -M userxxx
#PBS -N teste
#########################################
#       JOB DEFINITION                                       #
#########################################
#!/bin/bash
#!change the working directory (default is home directory)
cd /home5/userxxx/
echo Running on host `hostname` > /home5/userxxx/pbs.output
echo Time is `date` > /home5/userxxx/pbs.output
echo Directory is `pwd` > /home5/userxxx/pbs.output




The problem is that I cannot see any output writen to any file.

Here is the relevant line from the server log:


03/29/2007 16:27:28;000d;PBS_Server;Job;129.pc061.dq.ua.pt.dq.ua.pt;Post job file processing error; job 129.molecular-modeling.dq.ua.pt on host planck.dq.ua.pt


In our cluster all the home directories are globally shared by NFS to all nodes and I think that scp will no be used in that case but simple cp command.
In my opinion the problems may be file transactions between the submission server and the execution (mom) clients.

I provide the client configuration file from pbs_mom (config):

# MOM server configuration file
# if more than one value, separate it by comma.
#
# especifica o servidor de PBS que pode submeter jobs
$pbsserver pc061.dq.ua.pt
# especifica os clientes que o pbs_mom pode contactar atraves de portas privilegiadas
$pbsclient molecular-modeling.dq.ua.pt
$pbsclient planck.dq.ua.pt

$loglevel 7


I have also checked the undelivered directory in the client (planck.dq.ua.pt) and it is empty.

Can anyone provide me a clue to suceesfully resolve this problem?
Also if I cannot resolve this issue im planing to migrate the Batch System to Sun Grid Engine. What is your opinion about SGE?


Thanks in advance,


Best Regard,


Nelson Fonseca
Beowulf Cluster System Administrator
University of Aveiro
Portugal




_______________________________________________
torqueusers mailing list
torqueusers@xxxxxxxxxxxxxxxx
http://www.supercluster.org/mailman/listinfo/torqueusers
<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
qplus.devel/200...    network.jabber....    debian.qa-packa...    encryption.gpg....    python.dabo.dev...    uclinux.devel/2...    science.mathema...    recreation.pesc...    kernel.ck/2004-...    mozilla.devel.e...    tex.latex.prosp...    ietf.multi6/200...    bbc.cvs/2002-11...    xfree86.newbie/...    jakarta.taglibs...    altlinux.hardwa...    comedi/2002-05/...    horde.bugs/2004...    games.diplomacy...    finance.e-gold....    web.dom.test-su...    lang.ruby.rails...    os.netbsd.devel...    video.gstreamer...   
Home | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe

Navigation