-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Description
Describe the issue:
Description of the Error
I encountered an error when having a MaskedArray
with dtype
set as StringDType
and trying to set a fill_value
other than None
(or using the filled(fill_value)
method).
The error seems to come from a data type check of the fill_value
before setting it to the MaskedArray
here:
Line 501 in 8f68377
elif isinstance(fill_value, str) and (ndtype.char not in 'OSVU'): |
Possible fix
Currently it is just admitting OSVU
as valid numpy.dtype.char
types, when in https://numpy.org/doc/stable/reference/generated/numpy.dtype.kind.html
the T
character code for StringDType
could be admissible.
I tested rudimentarily that including T
in OSVU
for the check works just fine.
Reproduce the code example:
# Here the masked array is not created because `fill_value=''` (or any other string) is not accepted.
import numpy as np
from numpy.dtype import StringDType
strdt_ma = np.ma.MaskedArray(['zero', 'one', 'two', '', 'four'], mask=[False, False, False, True, False], fill_value='', dtype=StringDType(na_object='', coerce=True))
# Here the masked array is created successfully because `fill_value = None`, but fails when an string value is used
strdt_ma = np.ma.MaskedArray(['zero', 'one', 'two', '', 'four'], mask=[False, False, False, True, False], fill_value=None, dtype=StringDType(na_object='', coerce=True))
print(strdt_ma.dtype,strdt_ma.dtype.char)
strdt_ma.fill_value = '' # fails for any string
print(strdt_ma.filled('N/A')) # fails for any string
print(strdt_ma.filled()) # the missing value is cast as '?'
strdt_ma.filled()[3] == None # False
strdt_ma.filled()[3] =='?' # True
Error message:
Traceback (most recent call last)
Cell In[242], line 1
----> 1 strdt_ma = np.ma.MaskedArray(['zero', 'one', 'two', '', 'four'], mask=[False, False, False, True, False], fill_value='', dtype=StringDType(na_object='', coerce=True))
File /opt/anaconda3/envs/py313/lib/python3.13/site-packages/numpy/ma/core.py:3017, in MaskedArray.__new__(cls, data, mask, dtype, copy, subok, ndmin, fill_value, keep_mask, hard_mask, shrink, order)
3015 # But don't run the check unless we have something to check.
3016 if fill_value is not None:
-> 3017 _data._fill_value = _check_fill_value(fill_value, _data.dtype)
3018 # Process extra options ..
3019 if hard_mask is None:
File /opt/anaconda3/envs/py313/lib/python3.13/site-packages/numpy/ma/core.py:504, in _check_fill_value(fill_value, ndtype)
501 elif isinstance(fill_value, str) and (ndtype.char not in 'OSVU'):
502 # Note this check doesn't work if fill_value is not a scalar
503 err_msg = "Cannot set fill value of string with array of dtype %s"
--> 504 raise TypeError(err_msg % ndtype)
505 else:
506 # In case we want to convert 1e20 to int.
507 # Also in case of converting string arrays.
508 try:
TypeError: Cannot set fill value of string with array of dtype StringDType(na_object='')
Python and NumPy Versions:
import sys, numpy; print(numpy.__version__); print(sys.version)
2.3.1
3.13.5 | packaged by Anaconda, Inc. | (main, Jun 12 2025, 11:23:37) [Clang 14.0.6 ]
Runtime Environment:
No response
Context for the issue:
Recently I started to work with numpy>=2 to be able to use the functionality of variable width string arrays, and in so avoid unwanted string truncation during manipulation. I ran into this issue when trying to use MaskedArray
with the StringDType
and noticed the restriction in the fill_value
definition.
In conclusion, the issue should be solved by adding T
dtype character code to the accepted list in _check_fill_value()
function.
PD: I'm a long-time user and first time contributor, I don't mind submitting the change myself following the contributor guidelines https://numpy.org/devdocs/dev/index.html#development-process-summary but I'm not sure if there is anything else to do to be allowed to contribute. Thanks.