Systemd services and resource limits27 Apr 2016 #Linux #V-Ray
We made the move to CentOS 7 and I switched out all init.d scripts with systemd services. Yesterday I noticed we started getting errors on our render farm for huge scenes which required loading of thousands of files:
V-Ray warning: Could not load mesh file …
One hint that this wasn’t due to scene misconfiguration was that the initial ~1000 vrmeshes were loaded successfully, and after that no other vrmesh file could be loaded.
Systemd not inheriting system-wide limits?
So, after some investigation together with the helpful Chaosgroup developer
t.petrov over at the
I deducted that the issue was that the system-wide resource limit settings
weren’t respected for some reason:
# /etc/security/limits.conf * hard nofile 500000 * soft nofile 500000 * hard nproc 500000 * soft nproc 500000
ulimit -a, I could confirm that the limits were active:
core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 256849 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 500000 <---- yep! pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 500000 <---- yep! virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
But when running the same command as a task being carried out in
Pixar’s Tractor, our render
farm queuing manager, I noticed that the
nproc limits were not
respected and were sitting at their default values (
Previously, when using an init.d script to run the Tractor blade client on render farm blades, the limits were respected. But now, when I had switched that out in favor for a systemd script, I realized you have to specify these limits in the systemd service itself.
Here’s an example systemd script, that I use for running the Tractor blade
service with user
farmer, which relies on that the network is up and that
autofs is running so that it can successfully log to a mounted server
share - and which now also sets the
[Unit] Description=Tractor Blade Service Wants=network.target network-online.target autofs.service After=network.target network-online.target autofs.service [Service] Type=simple User=farmer ExecStart=/opt/pixar/Tractor-2.2/bin/tractor-blade --no-sigint --debug --log /logserver/tractor/%H.log --supersede --pidfile=/var/run/tractor-blade.pid LimitNOFILE=500000 LimitNPROC=500000 PIDFile=/var/run/tractor-blade.pid [Install] WantedBy=multi-user.target
LimitNOFILE specifed, all required files (several
thousands) were successfully loaded and the render completed as expected.
Keeping track of resource limits using Python
Just a quick example:
import platform if 'linux' in platform.system().lower(): import resource # Linux only limit_nofile = resource.getrlimit(resource.RLIMIT_NOFILE) limit_nproc = resource.getrlimit(resource.RLIMIT_NPROC) print 'Max number of opened files allowed:', limit_nofile print 'Max number of processes allowed', limit_nproc
Read more on the
resource module here.