Skip to content

fix: robust column name handling when indicator names contain underscores#67

Open
adamnadolny-wizipisi wants to merge 5 commits into
wetransform-os:mainfrom
adamnadolny-wizipisi:fix/column-names-with-underscores-66
Open

fix: robust column name handling when indicator names contain underscores#67
adamnadolny-wizipisi wants to merge 5 commits into
wetransform-os:mainfrom
adamnadolny-wizipisi:fix/column-names-with-underscores-66

Conversation

@adamnadolny-wizipisi

@adamnadolny-wizipisi adamnadolny-wizipisi commented Jun 22, 2026

Copy link
Copy Markdown

Summary

Fixes #66 - indicator names containing underscores caused incorrect normalization method extraction and wrong column filtering during aggregation.

Root cause

In aggregate_indicators(), normalized column names were parsed with a positional split on underscore, taking the second element. For a plain indicator name like indicator1 the column indicator1_minmax_01 gives the correct result minmax. But for my_indicator_minmax_01 (indicator named my_indicator) it gives indicator, which matches nothing in NormalizationFunctions and silently skips aggregation for that method.

A second problem: the column filter regex was unanchored, so an indicator named minmax_score would cause its columns to be included in the wrong normalization subset.

Fix

  • Introduce _NORM_SUFFIX_TO_METHOD and _extract_norm_method() that match the known suffix at the end of each column name, robust regardless of underscores in the indicator name.
  • Anchor the column filter regex to suffix position.

Tests

Five new test cases in test_mcda_without_robustness.py:

  • normalization with underscored indicator names
  • normalization with multiple consecutive underscores
  • aggregation (single method) with underscored indicator names
  • full sensitivity run with underscored indicator names
  • edge case: indicator names containing normalization keywords (e.g. minmax_score)

Test plan

  • python3 -m pytest -s tests/unit_tests/test_mcda_without_robustness.py -vv
  • Full test suite: python3 -m pytest -s tests/unit_tests -vv
  • Manual: CSV with underscore column names, run full MCDA analysis

Contributors

Contributed by The Wroclaw Institute of Spatial Information and Artificial Intelligence (WIZIPISI - Wroclawski Instytut Zastosowan Informacji Przestrzennej i Sztucznej Inteligencji) - adam.nadolny@wizipisi.ai

Splitting normalized column names on '_' and taking the element at index 1
breaks when the original indicator name contains underscores. For example,
'my_indicator_minmax_01'.split('_')[1] yields 'indicator' instead of 'minmax'.

Introduce _NORM_SUFFIX_TO_METHOD and _extract_norm_method() that match the
known suffixes (minmax_01, minmax_without_zero, target_01, …) at the end of
each column name, making the lookup robust regardless of underscores in the
original indicator names.

Fixes wetransform-os#66
The previous regex rf"{norm_method}" matched the method name anywhere in the
column string. An indicator named e.g. 'minmax_score' would cause columns for
other normalization methods to be incorrectly included in the minmax subset.

Use rf"_{norm_method}(_|$)" so that only columns whose suffix starts with
'_{norm_method}' are selected, regardless of what the indicator name contains.
Verify that indicator names containing underscores (e.g. 'my_indicator_one',
'ind_a_b_c') produce correctly-suffixed normalized columns for all
normalization methods. These tests would have failed before the fix in
_extract_norm_method().
Verify that both a single-method aggregation (weighted_sum + minmax) and the
full sensitivity run (all normalization × aggregation combinations) produce
correct output column names when indicator names contain underscores.
These tests exercise the fixed regex filter introduced alongside the
_extract_norm_method() change.
…ords

An indicator named 'minmax_score' or 'target_value' embeds a normalization
keyword. With the old unanchored regex this would cause the wrong column
subset to be selected during aggregation. Verify that the anchored filter
rf"_{norm_method}(_|$)" handles this correctly and produces exactly the
expected output columns without duplicates.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @adamnadolny-wizipisi, thank you for submitting a pull request. We will address it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug with column names containing underscores during internal renaming

1 participant