Do not crash on arrays of strings#223
Conversation
Sorry @bouweandela I've been tied up elsewhere, and forgot to respond here. |
|
Thanks for your MRE. but I think it shows that there are further things going wrong here, that more effort is needed to fix... So, there's already a "way around" the problem that datasets like this won't load into ncdata. But when we apply that, we see that there are other problems here associated with the Dask representation of the data : So, the result when loaded has the wrong datatype, and the realisation of the dask array doesn't deliver the expected dtype either! It is nevertheless possible this way to load the file, get the variable content and re-present it in a form acceptable to ncdata -- and Iris. For instance, as in the example. To be clear, what parts of this are important to you ? |
|
BTW it is still work-in-progress to properly handle the ordinary character arrays in iris data, |
|
Ping @bouweandela : I've done nothing more on this yet, but thought about it. |
|
What we're trying to achieve is to add a way to load data from any intake-esm catalogue into ESMValTool in ESMValGroup/ESMValCore#2690. So the data comes in as an Xarray dataset and I need to convert it to Iris cubes. It would be rather convenient if that just 'worked' (provided the datasets follow the CF conventions) and there would be no need to manipulate the Xarray datasets before conversion because then we need to write special case code, for example ESMValGroup/ESMValCore#2690 (comment), for every new catalogue we add. Our analyses do not even use the "member_id" coordinate, so even if it gets loaded as an AuxCoord instead of a DimCoord, they would run through just fine with the current behaviour of Iris if the code didn't stumble on the conversion step. Even models with many ensemble members in CMIP6 had less 100 of these, so if I would concatenate this data along the |
I'm trying to use
ncdatato load Pangeo data into iris, but encountering issues with a dimension that contains string variables. Complete example code:A similar issue was reported in #111. Here is a minimal reproducer:
The code changes here result at least being able to load the cube, even if iris still not accepts string type DimCoords.
By the way, are you aware that xarray does not support using the default fill-value at all? pydata/xarray#2742
@pp-mo Do you think this pull request could be a step in the right direction to making
ncdatawork with string arrays? I would be happy to continue working on this if you think it's worth it or close if you think a different approach is needed.