fix: robust column name handling when indicator names contain underscores#67
Open
adamnadolny-wizipisi wants to merge 5 commits into
Conversation
Splitting normalized column names on '_' and taking the element at index 1
breaks when the original indicator name contains underscores. For example,
'my_indicator_minmax_01'.split('_')[1] yields 'indicator' instead of 'minmax'.
Introduce _NORM_SUFFIX_TO_METHOD and _extract_norm_method() that match the
known suffixes (minmax_01, minmax_without_zero, target_01, …) at the end of
each column name, making the lookup robust regardless of underscores in the
original indicator names.
Fixes wetransform-os#66
The previous regex rf"{norm_method}" matched the method name anywhere in the
column string. An indicator named e.g. 'minmax_score' would cause columns for
other normalization methods to be incorrectly included in the minmax subset.
Use rf"_{norm_method}(_|$)" so that only columns whose suffix starts with
'_{norm_method}' are selected, regardless of what the indicator name contains.
Verify that indicator names containing underscores (e.g. 'my_indicator_one', 'ind_a_b_c') produce correctly-suffixed normalized columns for all normalization methods. These tests would have failed before the fix in _extract_norm_method().
Verify that both a single-method aggregation (weighted_sum + minmax) and the full sensitivity run (all normalization × aggregation combinations) produce correct output column names when indicator names contain underscores. These tests exercise the fixed regex filter introduced alongside the _extract_norm_method() change.
…ords
An indicator named 'minmax_score' or 'target_value' embeds a normalization
keyword. With the old unanchored regex this would cause the wrong column
subset to be selected during aggregation. Verify that the anchored filter
rf"_{norm_method}(_|$)" handles this correctly and produces exactly the
expected output columns without duplicates.
There was a problem hiding this comment.
Hello @adamnadolny-wizipisi, thank you for submitting a pull request. We will address it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #66 - indicator names containing underscores caused incorrect normalization method extraction and wrong column filtering during aggregation.
Root cause
In aggregate_indicators(), normalized column names were parsed with a positional split on underscore, taking the second element. For a plain indicator name like indicator1 the column indicator1_minmax_01 gives the correct result minmax. But for my_indicator_minmax_01 (indicator named my_indicator) it gives indicator, which matches nothing in NormalizationFunctions and silently skips aggregation for that method.
A second problem: the column filter regex was unanchored, so an indicator named minmax_score would cause its columns to be included in the wrong normalization subset.
Fix
Tests
Five new test cases in test_mcda_without_robustness.py:
Test plan
Contributors
Contributed by The Wroclaw Institute of Spatial Information and Artificial Intelligence (WIZIPISI - Wroclawski Instytut Zastosowan Informacji Przestrzennej i Sztucznej Inteligencji) - adam.nadolny@wizipisi.ai