You are here: Cluster » WikiGroups » Stucki
Tuning Test Gaia - ohne recreate des Storage - nur Kernel

from config: storeStorageDirectory = /export/fhgfs/data

ls -l /export/fhgfs/data ==> dm-2

Set from http://www.fhgfs.com/wiki/wikka.php?wakka=ServerTuning

  • echo deadline > /sys/block/dm-2/queue/scheduler
  • echo 4096 > /sys/block/dm-2/queue/nr_requests (invalid argument!)
  • echo 8192 > /sys/block/dm-2/queue/read_ahead_kb

  • echo 5 > /proc/sys/vm/dirty_background_ratio
  • echo 75 > /proc/sys/vm/dirty_ratio
  • echo 75 > /proc/sys/vm/vfs_cache_pressure
  • echo 262144 > /proc/sys/vm/min_free_kbytes

-- Cluster.stucki - 15 Feb 2012

  • cat /sys/block/dm-2/queue/read_ahead_kb: 128
  • cat /proc/sys/vm/dirty_background_ratio: 10
  • cat /proc/sys/vm/dirty_ratio: 20
  • cat /proc/sys/vm/vfs_cache_pressure: 100
  • cat /proc/sys/vm/min_free_kbytes: 67584

-- Cluster.salzmann - 20 Feb 2012

  • cat /sys/block/dm-2/queue/scheduler: none
  • cat /sys/block/sda/queue/scheduler: noop deadline [cfq]
  • cat /sys/block/dm-2/queue/nr_requests: 128

-- Cluster.salzmann - 20 Feb 2012

Wichtig, alle laufenden Rechenprozesse müssen für MPI (RDMA) grosse memory-pages 'locken'. Der LINUX Kernel-Default ist viel zu klein mit 64k. User-Programme ~-Prozesse. die per login (genauer 'durch pam') gestartet werden, sind korrekt per pam. ABER, alle daemons müssen selbst unlimitiert starten, um dies an ihre Sub-Prozesse vererben zu können.

Daher muss in /etc/default/torque-mom die Angabe

# limit fuer memory lock entfernen
ulimit -l unlimited

eingesetzt werden.

-- Cluster.stucki - 28 Jun 2012

# Tuning Params are set via 'simpel postsync':
GROUP: /simpel2/trees/groups/linux-fhgfs-storage
SCRIPT: etc/simpel/postsync/fhgfsStorageServerThroughputTuning.sh

Wirkt dieses Tuning auch auf die Partitionen fuer Metadaten?

-- Cluster.stucki - 15 Jul 2012

From the maui users list:

http://www.supercluster.org/pipermail/mauiusers/2012-August/004891.html

> How do I change the order when "its all the same" to the scheduler?

usually nodes are handed out in the reverse order they are listed in the node file. just try to order the nodes in that file accordingly and see if that helps.

-- Cluster.stucki - 21 Sep 2012

AND remember this warning? http://www.supercluster.org/pipermail/mauiusers/2012-May/004868.html

-- Cluster.stucki - 21 Sep 2012

Just to find: GaiaMoveTable

-- Cluster.stucki - 10 Oct 2012

NFS_Crash Workaround:

: which node (testing 129)
NN=129
pbsnodes -o -N "maintenance,boot" node$NN
: check for no jobs, do maintenance ...
: as a 'user' n adagio do:
echo echo NFStrigger $NN | qsub -l nodes=1:ppn=1,walltime=00:02:00,mem=640kb,host=node$NN -N NfsOn$NN -
: check/keep CREATED JOBNUMBER (last check was 5060)
: as an admin:
JOB=5060
pbsnodes -c -N '' node$NN; qrun -h node$NN $JOB
: with some luck, $JOB will be first

-- Cluster.stucki - 08 Nov 2012

qsub rejects if no node with enough processors 'is seen'! (even if a queue has larger minimum)

-- Cluster.stucki - 07 Feb 2014

Fix queue (gpu) verboten (partiton not allowed for group)
scontrol show partitions | less
scontrol update partition=gpu AllowGroups=
scontrol update partition=gpu AllowGroups=login_allegro_cmb

-- Cluster.stucki - 17 Aug 2016
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback