|
这两天都被作业提交系统torque的安装所困扰。我在官方网站上下载了torque好几个版本的源码*.tar.gz,开始安装4.1.2版本,按照 http://blog.chinaunix.net/uid-7726704-id-2045398.html 上的方法进行,出现了问题;后来下了官方的英文文档,按照上面的步骤安装时又出现了新的问题。我把出现问题的几次终端输出拷贝保存了下来,如下所示:
[peng@localhost coriolis]$ echo "sleep 30" | qsub
5.localhost.localdomain
[peng@localhost coriolis]$ qstat
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
5.localhost STDIN peng 0 C batch
[peng@localhost coriolis]$ qsub run_model.pbs
qsub: Unknown queue MSG=cannot locate queue
=====================================================================
[root@localhost torque-2.5.12]# pbsnodes -a
pbsnodes: Server has no node list MSG=node list is empty - check 'server_priv/nodes' file
[root@localhost torque-2.5.12]# pbs_mom
pbs_mom: LOG_ERROR::Resource temporarily unavailable (11) in pbs_mom, cannot lock '/var/spool/torque/mom_priv/mom.lock' - another mom running
cannot lock '/var/spool/torque/mom_priv/mom.lock' - another mom running
[root@localhost torque-2.5.12]#
=====================================================================
[root@localhost torque-2.5.12]# ./torque.setup peng
initializing TORQUE (admin: [email protected])
PBS_Server localhost.localdomain: Create mode and server database exists,
do you wish to continue y/(n)?y
root 5418 1 0 11:33 ? 00:00:00 pbs_server -t create
Max open servers: 10239
set server operators += [email protected]
Max open servers: 10239
set server managers += [email protected]
======================================================================
[peng@localhost coriolis]$ qsub run_model.pbs
qsub: Unknown queue MSG=cannot locate queue
You have new mail in /var/spool/mail/peng
*******************************************************************
From [email protected] Fri May 3 10:57:20 2013
Return-Path: <[email protected]>
X-Original-To: [email protected]
Delivered-To: [email protected]
Received: by localhost.localdomain (Postfix, from userid 0)
id CC589282ECD; Fri, 3 May 2013 10:57:20 +0800 (CST)
To: [email protected]
Subject: PBS JOB 1.localhost.localdomain
Precedence: bulk
Message-Id: <[email protected]>
Date: Fri, 3 May 2013 10:57:20 +0800 (CST)
From: [email protected] (root)
PBS Job Id: 1.localhost.localdomain
Job Name: STDIN
job deleted
Job deleted at request of [email protected]
Job could never run
******************************************************************
=====================================================================
[root@localhost torque-2.5.12]# vim /var/spool/torque/server_priv/nodes
[root@localhost torque-2.5.12]# pbs_server
PBS_Server: LOG_ERROR::Unknown node (15064) in process_host_name_part, host master not found
PBS_Server: LOG_ERROR::process_host_name_part, host master not found
PBS_Server: LOG_ERROR::Unknown node (15064) in process_host_name_part, host node01 not found
PBS_Server: LOG_ERROR::process_host_name_part, host node01 not found
pbs_server: network: Address already in use
PBS_Server: LOG_ERROR:BS_Server, init_network failed dis
[root@localhost torque-2.5.12]# pbs_sched
pbs_sched: LOG_ERROR::Address already in use (9 in main, bind
[root@localhost torque-2.5.12]# pbs_mom
pbs_mom: LOG_ERROR::Resource temporarily unavailable (11) in pbs_mom, cannot lock '/var/spool/torque/mom_priv/mom.lock' - another mom running
cannot lock '/var/spool/torque/mom_priv/mom.lock' - another mom running
[root@localhost torque-2.5.12]#
=====================================================================
其中================间隔表示中间做了一些无关或没有问题的操作。
请问有人知道是什么原因?怎么解决吗?请不吝指教。非常感谢! |
|