Tuning Test Gaia - ohne recreate des Storage - nur Kernel
from config:
storeStorageDirectory = /export/fhgfs/data
ls -l /export/fhgfs/data ==> dm-2
Set from
http://www.fhgfs.com/wiki/wikka.php?wakka=ServerTuning
- echo deadline > /sys/block/dm-2/queue/scheduler
- echo 4096 > /sys/block/dm-2/queue/nr_requests (invalid argument!)
- echo 8192 > /sys/block/dm-2/queue/read_ahead_kb
- echo 5 > /proc/sys/vm/dirty_background_ratio
- echo 75 > /proc/sys/vm/dirty_ratio
- echo 75 > /proc/sys/vm/vfs_cache_pressure
- echo 262144 > /proc/sys/vm/min_free_kbytes
-- Cluster.stucki - 15 Feb 2012
- cat /sys/block/dm-2/queue/read_ahead_kb: 128
- cat /proc/sys/vm/dirty_background_ratio: 10
- cat /proc/sys/vm/dirty_ratio: 20
- cat /proc/sys/vm/vfs_cache_pressure: 100
- cat /proc/sys/vm/min_free_kbytes: 67584
-- Cluster.salzmann - 20 Feb 2012
- cat /sys/block/dm-2/queue/scheduler: none
- cat /sys/block/sda/queue/scheduler: noop deadline [cfq]
- cat /sys/block/dm-2/queue/nr_requests: 128
-- Cluster.salzmann - 20 Feb 2012
Wichtig,
alle laufenden Rechenprozesse müssen für
MPI (RDMA)
grosse memory-pages 'locken'. Der LINUX Kernel-Default
ist viel zu klein mit 64k. User-Programme ~-Prozesse. die per login
(genauer 'durch pam') gestartet werden, sind korrekt per pam.
ABER, alle
daemons müssen selbst unlimitiert starten, um dies
an ihre Sub-Prozesse vererben zu können.
Daher muss in
/etc/default/torque-mom die Angabe
# limit fuer memory lock entfernen
ulimit -l unlimited
eingesetzt werden.
-- Cluster.stucki - 28 Jun 2012
# Tuning Params are set via 'simpel postsync':
GROUP: /simpel2/trees/groups/linux-fhgfs-storage
SCRIPT: etc/simpel/postsync/fhgfsStorageServerThroughputTuning.sh
Wirkt
dieses Tuning auch auf die Partitionen fuer
Metadaten?
-- Cluster.stucki - 15 Jul 2012
From the maui users list:
http://www.supercluster.org/pipermail/mauiusers/2012-August/004891.html
>
How do I change the order when "its all the same" to the scheduler?
usually nodes are handed out in the reverse order they
are listed in the node file. just try to order the nodes in
that file accordingly and see if that helps.
-- Cluster.stucki - 21 Sep 2012
AND remember this warning?
http://www.supercluster.org/pipermail/mauiusers/2012-May/004868.html
-- Cluster.stucki - 21 Sep 2012
Just to find:
GaiaMoveTable
-- Cluster.stucki - 10 Oct 2012
NFS_Crash Workaround:
: which node (testing 129)
NN=129
pbsnodes -o -N "maintenance,boot" node$NN
: check for no jobs, do maintenance ...
: as a 'user' n adagio do:
echo echo NFStrigger $NN | qsub -l nodes=1:ppn=1,walltime=00:02:00,mem=640kb,host=node$NN -N NfsOn$NN -
: check/keep CREATED JOBNUMBER (last check was 5060)
: as an admin:
JOB=5060
pbsnodes -c -N '' node$NN; qrun -h node$NN $JOB
: with some luck, $JOB will be first
-- Cluster.stucki - 08 Nov 2012
qsub rejects if no node with enough processors 'is seen'!
(even if a queue has larger minimum)
-- Cluster.stucki - 07 Feb 2014
Fix
queue (gpu) verboten (partiton not allowed for group)
scontrol show partitions | less
scontrol update partition=gpu AllowGroups=
scontrol update partition=gpu AllowGroups=login_allegro_cmb
-- Cluster.stucki - 17 Aug 2016