fix(glows): write L2 hist labels as strings and normalize global attr…#3196
Open
davidt0x wants to merge 4 commits into
Open
fix(glows): write L2 hist labels as strings and normalize global attr…#3196davidt0x wants to merge 4 commits into
davidt0x wants to merge 4 commits into
Conversation
… typing Clean up the remaining GLOWS L2 histogram metadata issues around label variables and global attribute typing. The current L2 hist write path was creating `bins_label` and `flags_label` as scalar `-1` placeholders with no explicit dimensions. That caused the emitted CDF variables to be written as integer types even though the metadata declares them as string labels. This matches the bug described in issue IMAP-Science-Operations-Center#2807. Fix this by writing both label variables as real string arrays: - `bins_label` now uses the stringified bin coordinate values with its own `bins_label` dimension, matching the existing L1A/L1B pattern - `flags_label` now uses the ordered bad-time flag names with its own `flags_label` dimension instead of a scalar integer placeholder To keep the label ordering consistent across the GLOWS pipeline, extract the bad-time flag names into a shared `BAD_TIME_FLAG_NAMES` constant and reuse it in both: - `PipelineSettings` bad-time flag extraction - GLOWS L2 `flags_label` construction Update the L2 attrs to match the written string labels: - keep `bins_label FORMAT` as `A4` - widen `flags_label FORMAT` from `A4` to `A42` so the longest shared bad-time flag name fits without truncation Also normalize `flight_software_version` to a string before assigning it as an L2 global attribute. The value content is preserved, but this avoids writing it as a numeric global attribute (`CDF_INT8`), which the validator flags as problematic for this metadata. As part of this work, rechecked `bad_time_flag_occurrences` on current `dev`. It is already being written as `CDF_UINT2`, so no product code change is made there in this commit. Add written-CDF regression coverage for GLOWS L2 hist to verify: - `bins_label` is emitted as `CDF_CHAR` with string bin labels - `flags_label` is emitted as `CDF_CHAR` with the shared bad-time flag names in the expected order - `flags_label FORMAT` remains wide enough for the longest current flag name - `bad_time_flag_occurrences` remains an integer CDF variable - `flight_software_version` is written as a string global attribute This change is intentionally limited to the GLOWS `hist` metadata issues covered by issue IMAP-Science-Operations-Center#2807 and the remaining related global-attribute typing warning. It does not include the separate GLOWS fill-value fixes for `histogram_flag_array` and `number_of_bins`.
laspsandoval
approved these changes
May 14, 2026
Contributor
laspsandoval
left a comment
There was a problem hiding this comment.
Thanks David. It looks good.
Improve coverage for the patch.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR resolves remaining GLOWS L2 hist CDF metadata issues by ensuring label coordinates are written as string arrays (not integer placeholders) and by normalizing a problematic global attribute to a string type for ISTP/SPDF validation compliance.
Changes:
- Write
bins_labelandflags_labelas real string arrays with explicit dimensions, and updateflags_labelFORMAT width toA42. - Centralize bad-time flag ordering via a shared
BAD_TIME_FLAG_NAMESconstant and deriveFLAG_LENGTHfrom it. - Normalize
flight_software_versionto a scalar string before writing L2 global attributes; add regression tests that validate emitted CDF types/values.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| imap_processing/tests/glows/test_glows_l2.py | Adds regression test that writes a CDF and verifies label/global-attr typing and label ordering. |
| imap_processing/glows/l2/glows_l2.py | Implements string label arrays, shared flag-name usage, and global-attr normalization for L2 hist output. |
| imap_processing/glows/l1b/glows_l1b_data.py | Reuses shared bad-time flag ordering when extracting pipeline bad-time flags. |
| imap_processing/glows/init.py | Introduces BAD_TIME_FLAG_NAMES and derives FLAG_LENGTH from it to keep ordering/length consistent. |
| imap_processing/cdf/config/imap_glows_l2_variable_attrs.yaml | Updates flags_label FORMAT to A42 to accommodate longest flag name. |
Comments suppressed due to low confidence (1)
imap_processing/cdf/config/imap_glows_l2_variable_attrs.yaml:15
- Typo in CATDESC: "daily-occurence" should be "daily-occurrence".
CATDESC: Flag names for daily-occurence counters of L1B flags
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+229
to
+255
| bins_label_info = cdf_file.varinq("bins_label") | ||
| bins_label_attrs = cdf_file.varattsget("bins_label") | ||
| bins_label_values = cdf_file.varget("bins_label") | ||
| flags_label_info = cdf_file.varinq("flags_label") | ||
| flags_label_attrs = cdf_file.varattsget("flags_label") | ||
| flags_label_values = cdf_file.varget("flags_label") | ||
| bad_time_info = cdf_file.varinq("bad_time_flag_occurrences") | ||
| bad_time_attrs = cdf_file.varattsget("bad_time_flag_occurrences") | ||
| global_attrs = cdf_file.globalattsget() | ||
|
|
||
| assert bins_label_info.Data_Type_Description == "CDF_CHAR" | ||
| assert bins_label_attrs["FORMAT"] == "A4" | ||
| assert list(bins_label_values[:5]) == ["0", "1", "2", "3", "4"] | ||
|
|
||
| assert flags_label_info.Data_Type_Description == "CDF_CHAR" | ||
| assert flags_label_attrs["FORMAT"] == "A42" | ||
| assert list(flags_label_values) == list(BAD_TIME_FLAG_NAMES) | ||
| assert max(len(name) for name in BAD_TIME_FLAG_NAMES) <= int( | ||
| flags_label_attrs["FORMAT"][1:] | ||
| ), ( | ||
| "Update flags_label FORMAT in imap_glows_l2_variable_attrs.yaml " | ||
| "if a flag name exceeds A42." | ||
| ) | ||
|
|
||
| assert bad_time_info.Data_Type_Description == "CDF_UINT2" | ||
| assert bad_time_attrs["FORMAT"] == "I5" | ||
| assert global_attrs["flight_software_version"] == ["131329"] |
Contributor
|
once that minor type is fixed, we can merge this! |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes the remaining GLOWS
histmetadata issues around label serialization and global attribute typing.It is scoped to
imap_glows_l2_hist, consistent with the June validation focus under parent issue #2577 and the GLOWS validation parent issue #2751.What Changed
Fix
bins_labelandflags_labelserializationbins_labelandflags_labelwere previously being created in GLOWS L2 as scalar-1placeholders with no explicit dimensions. That caused the written CDF variables to be emitted as integer types even though their metadata describes them as string labels.This PR fixes that by writing them as real string arrays:
bins_labelis now built from the stringified bin coordinate valuesflags_labelis now built from the ordered bad-time flag namesbins_labelflags_labelThis follows the bug report and expected fix direction in #2807.
Share the bad-time flag ordering
The bad-time flag names are now defined once as a shared
BAD_TIME_FLAG_NAMESconstant and reused in:PipelineSettingsbad-time flag extractionflags_labelconstructionThat keeps the label ordering locked to the same processing order used by the pipeline.
Update
flags_labelmetadata widthThe L2 attrs now use:
bins_label FORMAT = A4flags_label FORMAT = A42A42was chosen because the longest current bad-time flag name is 42 characters. A regression test now verifies that the shared flag names still fit within the declared format width.Normalize
flight_software_versionglobal attrflight_software_versionwas being written as a numeric global attribute, which the validator flagged as problematic.This PR preserves the value content but normalizes it to a scalar string before assigning it to the L2 output attrs, so it is written as a character global attribute instead of
CDF_INT8.Rechecked
bad_time_flag_occurrencesAs part of this work, I revalidated the current
devwrite path forbad_time_flag_occurrences.Result:
CDF_UINT2Testing
Added written-CDF regression coverage in
test_glows_l2.pyto verify:bins_labelis emitted asCDF_CHARflags_labelis emitted asCDF_CHARflags_labelcontains the shared bad-time flag names in pipeline orderA42widthbad_time_flag_occurrencesis still written as an integer CDF variableflight_software_versionis written as a string global attribute