Description
Usage of used cpu-threads is not controllable via the --threads environment for FastSurfer segmentation modules. In the FastSurfer surface pipeline, controllability is only given when threads is set to 1.
Overall, also setting the environment variable OMP_NUM_THREADS in run_fastsurfer.sh instead of recon-surf.sh may solve the issue for --threads 1. Other assignments (threads > 1) are, however, not guaranteed to keep the cpu usage to the determined thread number (neither in the segmentation nor the surface module). The issue here is numpys multi-processing:
In it's default state, numpy will use all available threads for all functions compiled against multi-processing compatible C libraries (OpenBLAS, MKL,...). This can cause issues in two ways a.) cpu overload when running in parallel, b.) slowdown of functions for small matrices/operations (unnecessary overhead basically). There is no option to change this in numpy per se (mainly because a catch-all solution for all the different C libraries is difficult: see e.g. numpy/numpy#16990, numpy/numpy#11826).
Short term solution
Set all possible relevant environment variables to a specific value before (!) numpy is imported. This is a simple solution with the drawback that all relevant variables (https://stackoverflow.com/questions/30791550/limit-number-of-threads-in-numpy) have to be known and changed (and the list might change).
Permanent fix
The current recommendation (per this discussion on the numpy github: numpy/numpy#11826) is to use the threadpoolctl package to wrap all relevant functions. This way, user-specified thread variables can actually be used, rather than limiting everything to 1. This would require several changes in Lapy and FastSurfer.
Description
Usage of used cpu-threads is not controllable via the --threads environment for FastSurfer segmentation modules. In the FastSurfer surface pipeline, controllability is only given when threads is set to 1.
Overall, also setting the environment variable OMP_NUM_THREADS in run_fastsurfer.sh instead of recon-surf.sh may solve the issue for --threads 1. Other assignments (threads > 1) are, however, not guaranteed to keep the cpu usage to the determined thread number (neither in the segmentation nor the surface module). The issue here is numpys multi-processing:
In it's default state, numpy will use all available threads for all functions compiled against multi-processing compatible C libraries (OpenBLAS, MKL,...). This can cause issues in two ways a.) cpu overload when running in parallel, b.) slowdown of functions for small matrices/operations (unnecessary overhead basically). There is no option to change this in numpy per se (mainly because a catch-all solution for all the different C libraries is difficult: see e.g. numpy/numpy#16990, numpy/numpy#11826).
Short term solution
Set all possible relevant environment variables to a specific value before (!) numpy is imported. This is a simple solution with the drawback that all relevant variables (https://stackoverflow.com/questions/30791550/limit-number-of-threads-in-numpy) have to be known and changed (and the list might change).
Permanent fix
The current recommendation (per this discussion on the numpy github: numpy/numpy#11826) is to use the threadpoolctl package to wrap all relevant functions. This way, user-specified thread variables can actually be used, rather than limiting everything to 1. This would require several changes in Lapy and FastSurfer.