Skip to content

BUG: allow MaskedArray.fill_value be a string when dtype=StringDType #29423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

phdparedes
Copy link

This pull request includes a minor fix in the _check_fill_value function in numpy/ma/core.py. The change adjusts the condition to include the character 'T' in the type check for ndtype.char.

This fix adds the StringDType data type character code 'T' to the list of accepted data types that can use a 'fill_value' as string. Currently just accepts None. "See numpy#29421" Issue.
@phdparedes phdparedes changed the title BUG: allow ma.MaskedArray.fill_value be a string when dtype=StringDType BUG: allow MaskedArray.fill_value be a string when dtype=StringDType Jul 23, 2025
@melissawm melissawm moved this to Awaiting a code review in NumPy first-time contributor PRs Jul 23, 2025
@ngoldbaum
Copy link
Member

Sorry - it's not totally clear to me what this fixes.

Can you add a test? Maybe there's an existing test for this in the masked array tests that check the other dtypes you can extend.

@ngoldbaum
Copy link
Member

It looks like if I just extend the existing tests I can trigger the bug this PR is fixing. See ngoldbaum@4343115. You can pull that commit and push or I can just push directly to your PR branch, whichever you feel most comfortable with. Generally I don't push to people's PR branches without explicit permission.

Once there are tests I'm happy to merge this.

@phdparedes
Copy link
Author

Oh I see now, my apologies Nathan, as a first time contributor I totally miss the testing inclusion with numpy development standards. I think I'll do the pull and push, but let me double check this just to make sure I'm understanding this correctly. Thank again.

ma2 = masked_array(["cde", "b", "a"], mask=[0, 1, 0], fill_value=fill, dtype=dt)
assert_equal(op(ma1, ma2)._data, op(ma1._data, ma2._data))


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

to appease the linter

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding test functions for test_core.py now.

@phdparedes
Copy link
Author

Not ready to merge yet, I'm double checking something I just noticed.

@ngoldbaum
Copy link
Member

Not ready to merge yet, I'm double checking something I just noticed.

Still double-checking?

@phdparedes
Copy link
Author

I was checking the behavior of MaskedArray when is fill_value is not set. For the other dtypes it uses a default fill_value defined in default_filler dictionary

numpy/numpy/ma/core.py

Lines 176 to 185 in fda7b4c

default_filler = {'b': True,
'c': 1.e20 + 0.0j,
'f': 1.e20,
'i': 999999,
'O': '?',
'S': b'N/A',
'u': 999999,
'V': b'???',
'U': 'N/A'
}

I was wondering about adding one for StringDType like N/A as for string. As it is now still works if MaskedArray initializes without fill_value set, a fill_value is given regardless, just like for dtype=object O.
do you think I should add 'T' : 'N/A' or 'T' : b'N/A' to the default_filler dict?

@ngoldbaum
Copy link
Member

ngoldbaum commented Jul 28, 2025

I think using "N/A" is fine. It might make sense eventually to also do something sensible if someone wants to use a fill value that isn't a string, but we can handle that then.

You can use StringDType with non-string fill values, so you can do stuff like this:

In [10]: dt = np.dtypes.StringDType(na_object=None)

In [11]: arr = np.array(['hello', 'world', None], dtype=dt)

In [12]: arr
Out[12]: array(['hello', 'world', None], dtype=StringDType(na_object=None))

In [13]: arr[2] is None
Out[13]: True

but "N/A" is probably a fine default to use. If you wanted to make it so non-string fill values work, you'll need to set na_object on the StringDType instance on the MaskedArray.

@ngoldbaum
Copy link
Member

OK, awesome. It’s borderline but IMO this deserves a release note too.

See https://github.com/numpy/numpy/blob/main/doc/release/upcoming_changes/README.rst and the other release note fragments in that directory for instructions and examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Awaiting a code review
Development

Successfully merging this pull request may close these issues.

2 participants