from config: storeStorageDirectory = /export/fhgfs/data
ls -l /export/fhgfs/data ==> dm-2
Set from http://www.fhgfs.com/wiki/wikka.php?wakka=ServerTuning-- Cluster.stucki - 15 Feb 2012
-- Cluster.salzmann - 20 Feb 2012
-- Cluster.salzmann - 20 Feb 2012
Wichtig, alle laufenden Rechenprozesse müssen für MPI (RDMA) grosse memory-pages 'locken'. Der LINUX Kernel-Default ist viel zu klein mit 64k. User-Programme ~-Prozesse. die per login (genauer 'durch pam') gestartet werden, sind korrekt per pam. ABER, alle daemons müssen selbst unlimitiert starten, um dies an ihre Sub-Prozesse vererben zu können. Daher muss in/etc/default/torque-mom
die Angabe
# limit fuer memory lock entfernen ulimit -l unlimited
eingesetzt werden.
-- Cluster.stucki - 28 Jun 2012
# Tuning Params are set via 'simpel postsync':-- Cluster.stucki - 15 Jul 2012
From the maui users list:
http://www.supercluster.org/pipermail/mauiusers/2012-August/004891.html > How do I change the order when "its all the same" to the scheduler?usually nodes are handed out in the reverse order they are listed in the node file. just try to order the nodes in that file accordingly and see if that helps.
-- Cluster.stucki - 21 Sep 2012
AND remember this warning? http://www.supercluster.org/pipermail/mauiusers/2012-May/004868.html-- Cluster.stucki - 21 Sep 2012
Just to find: GaiaMoveTable-- Cluster.stucki - 10 Oct 2012
NFS_Crash Workaround:
: which node (testing 129) NN=129 pbsnodes -o -N "maintenance,boot" node$NN : check for no jobs, do maintenance ... : as a 'user' n adagio do: echo echo NFStrigger $NN | qsub -l nodes=1:ppn=1,walltime=00:02:00,mem=640kb,host=node$NN -N NfsOn$NN - : check/keep CREATED JOBNUMBER (last check was 5060) : as an admin: JOB=5060 pbsnodes -c -N '' node$NN; qrun -h node$NN $JOB : with some luck, $JOB will be first
-- Cluster.stucki - 08 Nov 2012
qsub rejects if no node with enough processors 'is seen'! (even if a queue has larger minimum)
-- Cluster.stucki - 07 Feb 2014
Fix queue (gpu) verboten (partiton not allowed for group)scontrol show partitions | less scontrol update partition=gpu AllowGroups= scontrol update partition=gpu AllowGroups=login_allegro_cmb-- Cluster.stucki - 17 Aug 2016