Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
15 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ data_quality:
DEPEND_1: priority
DICT_KEY: SPASE>Support>SupportQuantity:DataQuality
FIELDNAM: Data Quality
FILLVAL: *uint8_fillval
FILLVAL: *uint16_fillval
FORMAT: I3
LABLAXIS: Data Quality
LABL_PTR_1: priority_label
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ data_quality:
DEPEND_1: priority
DICT_KEY: SPASE>Support>SupportQuantity:DataQuality
FIELDNAM: Data Quality
FILLVAL: *uint8_fillval
FILLVAL: *uint16_fillval
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VALIDMIN/MAX for this variable are for a uint8 data type, so are you sure that it has been changed to a uint16? If so, update VALIDMIN/MAX

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, the written data_quality variable is currently CDF_UINT2, so the FILLVAL has to use the uint16 sentinel 65535. I left VALIDMIN/MAX as 0/255 intentionally because those describe the valid flag values, not the full storage range. I thought the the fill value should remain outside that valid range.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into the direct-events data_quality metadata mismatch as a possible follow-up, but I’m not including it in this PR as it is already getting a bit complex.

What is happening now:

  • in the direct-events L1A builder, data_quality is initialized together with num_events as a (epoch, priority) uint16 array filled with 65535
  • later, the actual data_quality values assigned into that array are just the unpacked suspect bit, so the real data values are only 0 or 1
  • because the backing array is uint16, the written CDF type becomes CDF_UINT2, and the matching L2 metadata/tests on this branch were updated to use FILLVAL = 65535

Why this looks odd:

  • in the rest of CoDICE, data_quality is treated as a simple 0/1 flag with uint8-style metadata (FILLVAL = 255, VALIDMAX = 1)
  • so direct-events are currently an outlier, most likely because data_quality was grouped into the same shared 2D initialization path as num_events, which actually does need a larger integer range

What the cleaner fix would be:

  • keep num_events as uint16
  • split data_quality out so it is initialized as uint8 with fill 255
  • update the direct-events L2 metadata/tests so data_quality writes as CDF_UINT1 again

I verified that the source suspect values are already uint8, so this would be a small structural cleanup rather than a science-algorithm change. I’m leaving it for a separate PR to keep the current branch focused. Also, maybe its better left as uint16 for something downstream? I think I will leave these YAML config changes as it though since we can be confident the VALIDMAX won't exceed 255.

FORMAT: I3
LABL_PTR_1: priority_label
SCALETYP: linear
Expand Down Expand Up @@ -261,7 +261,7 @@ spin_sector:
DICT_KEY: SPASE>Support>SupportQuantity:Positional
FIELDNAM: Spin Sector Index
FILLVAL: *uint8_fillval
FORMAT: I2
FORMAT: I3
LABLAXIS: Spin Sector
LABL_PTR_1: priority_label
LABL_PTR_2: event_num_label
Expand Down
4 changes: 3 additions & 1 deletion imap_processing/cdf/config/imap_glows_l2_variable_attrs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,7 @@ flux_uncertainties:
histogram_flag_array:
<<: *lightcurve_defaults
CATDESC: Bad-angle flags for histogram bins
CDF_DATA_TYPE: CDF_UINT1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be uint8? I don't know enough about this key CDF_DATA_TYPE though. This is new one. similar comment for below.

Copy link
Copy Markdown
Collaborator Author

@davidt0x davidt0x May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, CDF_UINT1 is the CDF-side spelling for an unsigned 8-bit integer, so this is the explicit uint8 form for histogram_flag_array. The same idea applies to number_of_bins below: that one is CDF_UINT2 because it needs a uint16 range and the ISTP-recommended fill sentinel is 65535.

Based on the code, I think uint8 is the appropriate data type as well:

  • histogram_flag_array is already produced as np.uint8 in L1B
  • L2 preserves that and explicitly casts the OR-reduced result back to np.uint8

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does adding CDF_DATA_TYPE cause cdflib to cast the variable to this specified type when writing the CDF? I believe that typically, the variable data type informs cdflib what to set the CDF_DATA_TYPE attribute to.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, cdflib.xarray_to_cdf treats CDF_DATA_TYPE as a write-time override when building the CDF variable spec. In this case the data is also preserved as uint8, so we are not relying only on the attr. The regression test opens the written CDF and verifies histogram_flag_array is actually written as CDF_UINT1 with a uint8(255) fill.

DICT_KEY: SPASE>Support>SupportQuantity:DataQuality
FIELDNAM: Bad-angle flags for histogram
FILLVAL: 255
Expand Down Expand Up @@ -522,9 +523,10 @@ ecliptic_lat:
number_of_bins:
<<: *support_data_defaults
CATDESC: Number of bins in histogram
CDF_DATA_TYPE: CDF_UINT2
DICT_KEY: SPASE>Support>SupportQuantity:Other
FIELDNAM: Number of bins in histogram
FILLVAL: -9223372036854775808
FILLVAL: *max_uint16
FORMAT: I4
LABLAXIS: No. of bins
UNITS: ' '
Expand Down
3 changes: 1 addition & 2 deletions imap_processing/cdf/config/imap_hit_l2_variable_attrs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,7 @@ dynamic_threshold_state:
DICT_KEY: SPASE>Support>SupportQuantity:InstrumentMode
DISPLAY_TYPE: time_series
FIELDNAM: Dynamic threshold state
FILLVAL: -128
FILLVAL: 255
FORMAT: I1
LABLAXIS: State
SCALEMAX: 1000
Expand Down Expand Up @@ -1997,4 +1997,3 @@ fe_total_uncert_minus_macropixel:




5 changes: 2 additions & 3 deletions imap_processing/cdf/config/imap_idex_l1b_variable_attrs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -483,8 +483,8 @@ spin_phase:
CATDESC: IMAP Spin Phase
DICT_KEY: SPASE>Support>SupportQuantity:SpinPhase
FIELDNAM: Spin Phase
FILLVAL: *int_fillval
FORMAT: I3
FILLVAL: *double_fillval
FORMAT: F8.3
LABLAXIS: Spin Phase
UNITS: Degrees
VALIDMAX: 360
Expand All @@ -499,4 +499,3 @@ solar_longitude:
UNITS: Degrees
VALIDMAX: 180
VALIDMIN: -180

2 changes: 1 addition & 1 deletion imap_processing/cdf/config/imap_mag_l2_variable_attrs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ vector_attrs: &vectors_default
DEPEND_1: direction
DICT_KEY: "SPASE>Field>FieldQuantity:Magnetic,Qualifier:Vector,CoordinateSystemName:DSRF,CoordinateRepresentation:Cartesian"
FIELDNAM: Magnetic Field Vector
FILLVAL: 9223372036854775807
FILLVAL: -1.0e31
FORMAT: F12.5
LABL_PTR_1: direction_label

Expand Down
8 changes: 4 additions & 4 deletions imap_processing/cdf/config/imap_swapi_variable_attrs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -107,14 +107,14 @@ esa_energy:
DEPEND_1: esa_step
DICT_KEY: SPASE>Particle>ParticleType:Ion,ParticleQuantity:EnergyPerCharge
FIELDNAM: ESA Energy
FILLVAL: -9223372036854775808
FORMAT: I5
FILLVAL: -1.0000000E+31
FORMAT: F8.1
LABLAXIS: Energy(eV)
LABL_PTR_1: esa_step_label
SCALETYP: linear
UNITS: eV / q
VALIDMAX: 65535
VALIDMIN: 0
VALIDMAX: 21000.0
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmplummer, I think you had some concerns the VALIDMAX on this one being 65535 (int16 max). I checked in with @mmshaw and she said 21000 would be a better value.

VALIDMIN: 0.0
VAR_TYPE: support_data

metadata_default: &metadata_default
Expand Down
2 changes: 1 addition & 1 deletion imap_processing/cdf/config/imap_swe_l2_variable_attrs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@ acq_duration:
DICT_KEY: SPASE>Support>SupportQuantity:Temporal,Qualifier:Array
DISPLAY_TYPE: spectrogram
FIELDNAM: Acquisition Duration
FILLVAL: -9223372036854775808
FILLVAL: 4294967295
FORMAT: I10
LABL_PTR_1: esa_step_label
LABL_PTR_2: spin_sector_label
Expand Down
29 changes: 25 additions & 4 deletions imap_processing/codice/codice_l2.py
Original file line number Diff line number Diff line change
Expand Up @@ -563,6 +563,20 @@ def process_lo_species_intensity(
for species in species_list:
dataset[species].data[half_spin_boundary] = np.nan

for var in ["nso_esa_step", "nso_spin_sector"]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lacoak21 , can you check this one since CoDICE was picking up using specific fillval for their data.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, with these two variables, the intent is to preserve them as integer support variables in the final L2 product. They are indices / enumerated support values, not continuous measured quantities.

What was happening before is that older packet versions do not include these fields, so missing values were represented with NaN upstream. Once NaN is introduced, NumPy promotes the array to float, and that widened dtype propagated all the way to the written CDF. That is what led to the non-standard FILLVAL / FORMAT issues here.

So the change in this PR is not redefining the variable type. It is restoring the intended type before write-out:

  • replace non-finite values with the integer fill sentinel
  • cast back to uint8
  • write them out as integer support vars again

In other words, this fix is meant to undo float promotion caused by missing-data handling, not to introduce a new convention for these CoDICE variables.

if var in dataset:
fillval = dataset[var].attrs["FILLVAL"]
restored_values = dataset[var].data.astype(np.float64, copy=True)
restored_values = np.nan_to_num(
restored_values, nan=fillval, posinf=fillval, neginf=fillval
)
restored_values = np.clip(np.rint(restored_values), 0, 255).astype(np.uint8)
dataset[var] = xr.DataArray(
restored_values,
dims=dataset[var].dims,
attrs=dataset[var].attrs,
)

return dataset


Expand Down Expand Up @@ -1093,6 +1107,10 @@ def process_lo_direct_events(dependencies: ProcessingInputCollection) -> xr.Data
l2_dataset["position"].dims,
elevation_angle.astype(np.float32),
)
spin_sector_attrs = cdf_attrs.get_variable_attributes(
"spin_sector", check_schema=False
)
spin_sector_fillval = np.uint8(spin_sector_attrs["FILLVAL"])
# Convert spin_sector to spin_angle in degrees
# Use equation from section 11.2.2 of algorithm document
# Shift all spin sectors for all positions 13 - 24 adding 12 and mod 24
Expand All @@ -1104,13 +1122,16 @@ def process_lo_direct_events(dependencies: ProcessingInputCollection) -> xr.Data
)
l2_dataset["spin_angle"] = l2_dataset["spin_sector"].astype(np.float32) * 15.0 + 7.5

# Set spin angle and sector to NaN for invalid positions (>23)
# Preserve spin_sector as an integer index while marking invalid sectors.
invalid_spin_sector = ~np.isfinite(original_spin_sector) | (
original_spin_sector > 23
)
l2_dataset["spin_angle"] = xr.where(
(original_spin_sector > 23), np.nan, l2_dataset["spin_angle"]
invalid_spin_sector, np.nan, l2_dataset["spin_angle"]
)
l2_dataset["spin_sector"] = xr.where(
(original_spin_sector > 23), np.nan, l2_dataset["spin_sector"]
)
invalid_spin_sector, spin_sector_fillval, l2_dataset["spin_sector"]
).astype(np.uint8)
# convert apd energy to physical units
# Set the gain labels based on gain values
gains = l2_dataset["gain"].values.ravel()
Expand Down
45 changes: 39 additions & 6 deletions imap_processing/glows/l2/glows_l2.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,40 @@
logger = logging.getLogger(__name__)


def _pad_daily_lightcurve_bins(value: object, fillval: object) -> np.ndarray:
"""
Pad chopped daily-lightcurve bin data back to the standard bin count.

Parameters
----------
value : object
Chopped daily-lightcurve bin data.
fillval : object
CDF fill value used for padded bins.

Returns
-------
numpy.ndarray
Bin data padded to the standard bin count.
"""
value_array = np.asarray(value)
padded_dtype = value_array.dtype
fillval_dtype = np.asarray(fillval).dtype

if np.issubdtype(padded_dtype, np.integer):
try:
cast_fillval = np.array(fillval, dtype=padded_dtype).item()
except (OverflowError, TypeError, ValueError):
padded_dtype = np.result_type(padded_dtype, fillval_dtype)
else:
if cast_fillval != fillval:
padded_dtype = np.result_type(padded_dtype, fillval_dtype)

padded = np.full(GlowsConstants.STANDARD_BIN_COUNT, fillval, dtype=padded_dtype)
padded[: len(value_array)] = value_array
return padded


def glows_l2(
input_dataset: xr.Dataset,
pipeline_settings_dataset: xr.Dataset,
Expand Down Expand Up @@ -193,7 +227,9 @@ def create_l2_dataset(
# Convert time to UTC
utc_string = [met_to_utc(ttj2000ns_to_met(value))]
output[key] = xr.DataArray(
utc_string, dims=["epoch"], attrs=attrs.get_variable_attributes(key)
utc_string,
dims=["epoch"],
attrs=attrs.get_variable_attributes(key),
)
elif key != "daily_lightcurve":
val = value
Expand All @@ -205,12 +241,11 @@ def create_l2_dataset(
attrs=attrs.get_variable_attributes(key),
)

n_bins = histogram_l2.daily_lightcurve.number_of_bins
for key, value in dataclasses.asdict(histogram_l2.daily_lightcurve).items():
if key == "number_of_bins":
# number_of_bins does not have a bins dimension.
output[key] = xr.DataArray(
np.array([value]),
np.array([value], dtype=np.uint16),
dims=["epoch"],
attrs=attrs.get_variable_attributes(key),
)
Expand All @@ -219,9 +254,7 @@ def create_l2_dataset(
# avoid operating on FILLVAL data. Re-expand to STANDARD_BIN_COUNT
# here, filling unused bins with the variable's CDF FILLVAL.
var_attrs = attrs.get_variable_attributes(key)
fillval = var_attrs["FILLVAL"]
padded = np.full(GlowsConstants.STANDARD_BIN_COUNT, fillval)
padded[:n_bins] = value
padded = _pad_daily_lightcurve_bins(value, var_attrs["FILLVAL"])
output[key] = xr.DataArray(
np.array([padded]),
dims=["epoch", "bins"],
Expand Down
47 changes: 46 additions & 1 deletion imap_processing/tests/codice/test_codice_l2.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from unittest import mock
from unittest.mock import MagicMock, patch

import cdflib
import numpy as np
import pandas as pd
import pytest
Expand Down Expand Up @@ -379,7 +380,13 @@ def test_codice_l2_sw_species_intensity(mock_get_file_paths, codice_lut_path):
)
processed_2_ds.attrs["Data_version"] = "001"
assert processed_2_ds.attrs["Logical_source"] == "imap_codice_l2_lo-sw-species"
write_cdf(processed_2_ds)
cdf_file_path = write_cdf(processed_2_ds)
cdf_file = cdflib.CDF(cdf_file_path)
for var in ["nso_esa_step", "nso_spin_sector"]:
var_info = cdf_file.varinq(var)
var_attrs = cdf_file.varattsget(var)
assert var_info.Data_Type_Description == "CDF_UINT1"
assert var_attrs["FILLVAL"] == np.uint8(255)


@patch("imap_data_access.processing_input.ProcessingInputCollection.get_file_paths")
Expand All @@ -402,6 +409,30 @@ def test_codice_l2_lo_de(mock_get_file_paths, codice_lut_path):
]

processed_l2_ds = process_codice_l2("lo-direct-events", ProcessingInputCollection())
l1a_input_ds = load_cdf(processed_l1a_file)
original_spin_sector = l1a_input_ds["spin_sector"].values
# Mirror the LO direct-event spin-sector remapping so this test catches any
# unintended changes to valid sector values while still checking that only
# invalid sectors are replaced with the uint8 fill value.
expected_spin_sector = np.where(
(l1a_input_ds["position"].values >= 13)
& (l1a_input_ds["position"].values <= 24),
(original_spin_sector + 12) % 24,
original_spin_sector,
)
invalid_spin_sector = ~np.isfinite(original_spin_sector) | (
original_spin_sector > 23
)
expected_spin_sector = np.where(
invalid_spin_sector, np.uint8(255), expected_spin_sector
).astype(np.uint8)
assert processed_l2_ds["spin_sector"].dtype == np.uint8
np.testing.assert_array_equal(
processed_l2_ds["spin_sector"].values,
expected_spin_sector,
err_msg="LO direct-event spin_sector values changed unexpectedly",
)

l2_val_data = (
imap_module_directory
/ "tests"
Expand Down Expand Up @@ -441,6 +472,15 @@ def test_codice_l2_lo_de(mock_get_file_paths, codice_lut_path):
file = write_cdf(processed_l2_ds)
errors = CDFValidator().validate(file)
assert not errors
cdf_file = cdflib.CDF(file)
spin_sector_info = cdf_file.varinq("spin_sector")
spin_sector_attrs = cdf_file.varattsget("spin_sector")
data_quality_info = cdf_file.varinq("data_quality")
data_quality_attrs = cdf_file.varattsget("data_quality")
assert spin_sector_info.Data_Type_Description == "CDF_UINT1"
assert spin_sector_attrs["FILLVAL"] == np.uint8(255)
assert data_quality_info.Data_Type_Description == "CDF_UINT2"
assert data_quality_attrs["FILLVAL"] == np.uint16(65535)
load_cdf(file)


Expand Down Expand Up @@ -494,4 +534,9 @@ def test_codice_l2_hi_de(mock_get_file_paths, codice_lut_path):
file = write_cdf(processed_l2_ds)
errors = CDFValidator().validate(file)
assert not errors
cdf_file = cdflib.CDF(file)
data_quality_info = cdf_file.varinq("data_quality")
data_quality_attrs = cdf_file.varattsget("data_quality")
assert data_quality_info.Data_Type_Description == "CDF_UINT2"
assert data_quality_attrs["FILLVAL"] == np.uint16(65535)
load_cdf(file)
Loading
Loading