Support multiple EESSI versions by using ReFrame environments#326
Support multiple EESSI versions by using ReFrame environments#326casparvl wants to merge 13 commits intoEESSI:mainfrom
Conversation
… in a single run, by configuring ReFrame environments and using ReFrame's own find_modules function
|
Ok, I now simply wrap the |
|
Oh, this broke that's something we might want to fix if we do the wrapped thing (if it's easy...) |
|
Metallwalls should also be adapted, since I think it doesn't use EESSI_mixin (?): |
|
For me: At first look, that looks quite good: all CUDA-enabled modules are only scheduled on our GPU partitions, all non-CUDA modules only on the CPU partitions. Modules are discovered accross both EESSI 2023.06 and EESSI 2025.06. The execution takes quite long - possibly also because we don't cache (yet). Anyway, I'm not disappointed with the current state: it needs some work to fix the last tests, but not a bad starting point. |
…ssi_environments' into support_multiple_eessi_environments
…n of sys:part +feat notation directly
|
i must be doing something wrong here (with my suggested fix above). |
|
ok, solved it by removing |
| environments = [ | ||
| {'name': 'EESSI-2023.06', 'modules': ['EESSI/2023.06']}, | ||
| {'name': 'EESSI-2025.06', 'modules': ['EESSI/2025.06']}, | ||
| ] | ||
| environs = ['EESSI-2023.06', 'EESSI-2025.06'] |
There was a problem hiding this comment.
as we are overwriting the environments, this change means that we can no longer use the default environment, which is needed to run local modules.
a simple solution would be to let it depend on an environment variable, e.g. USE_EESSI_MODULES:
if os.getenv('USE_EESSI_MODULES', True):
environments = [
{'name': 'EESSI-2023.06', 'modules': ['EESSI/2023.06']},
{'name': 'EESSI-2025.06', 'modules': ['EESSI/2025.06']},
]
environs = ['EESSI-2023.06', 'EESSI-2025.06']
else:
environments = [{'name': 'default'}]
environs = ['default']There was a problem hiding this comment.
Or should we append this to local config instead of overwrite?
There was a problem hiding this comment.
i prefer not to append to the local config, and only allow the environments that we support.
but thinking more about it, we can actually just do this:
environments = [
{'name': 'EESSI-2023.06', 'modules': ['EESSI/2023.06']},
{'name': 'EESSI-2025.06', 'modules': ['EESSI/2025.06']},
{'name': 'default'},
]
environs = ['EESSI-2023.06', 'EESSI-2025.06', 'default']- if the EESSI modules are in MODULEPATH, it will use them, and in that case you should not have any local modules in the MODULEPATH, so there will be no local modules found.
- if the EESSI modules are not in MODULEPATH, the user should add
-p defaultto avoid failure when trying to load an EESSI module, which i think isn't too bad.
There was a problem hiding this comment.
Hmmm, I was thinking if you do have local environments that need to be loaded (e.g. we have local 2024, 2025 modules that load our local software stacks), it's annoying if everything is overwritten. And you don't really want to keep a separate config for local testing, because that means duplicating a lot of information. But what I then realized: you can just update the environments after calling set_common_required_config. That way, you can still have a single config file AND have local environs appended as well.
There was a problem hiding this comment.
ok, i didn't think of that use case. then i agree to just append them.
Co-authored-by: Sam Moors <smoors@users.noreply.github.com>
I have no clue why
|
|
yes, i get the expected behavior now: checking the ReFrame source code, the way it works as i understand: when it tries to load a module (here |
|
i looked a bit into implementing a cache for the modules, but it's not as trivial as i thought: we need to create a separate cache for each environment. also, the so, i'm now thinking to keep using our own |
…an still load environments for testing local modules
|
Discussed in the test suite sync, we can actually cache the triplets that That means @smoors is right and the only way to properly cache is to implement the caching functionality inside the Something like Ok, we need to figure out the last part (what does |
|
CI is failing because the eessi mixin class and the metallwalls test set the |
…ult', as we now use ReFrame environments to test over multiple versions of EESSI
|
Ah, the issue with the CI is a bit more subtle: the GH actions One thing that's a bit weird is that ReFrame's standard environment is called Anyway, I'll just remove it. That way, the |
| compute_device = parameter([DEVICE_TYPES.CPU]) | ||
|
|
||
| @run_after('init') | ||
| def run_after_init(self): |
There was a problem hiding this comment.
We'll have to duplicate the new run-after-init stuff we do in eessi-mixin here (or port this test to EESSI_mixin, finally?)
Add initial functionality to schedule test for all EESSI environments in a single run, by configuring ReFrame environments and using ReFrame's own find_modules function.
For now, these work:
Still need to work on the other tests.