WIP Add function to convert array namespace and device to reference array #31829

lucyleeow · 2025-07-24T05:54:11Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Adds a function that converts arrays to the namespace and device of the reference array.

Tries DLPack first, and if either array does not support it, tries to convert manually.

Any other comments?

This is an initial attempt, and what it would look like in a simple metric. Feedback welcome. (Tests to come)

I thought about also outputting the namespace and device of the reference array, to avoid the second call to get_namespace_and_device, but I thought it would make the outputs too messy.

cc @ogrisel @betatim @StefanieSenger @virchan @lesteve

github-actions · 2025-07-24T05:55:10Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: e99209e. Link to the linter CI: here}

jeremiedbb · 2025-07-24T09:34:08Z

sklearn/metrics/_classification.py

    xp, _, device = get_namespace_and_device(y_true, y_pred, sample_weight)
+    y_true, sample_weight = _convert_to_reference(
+        reference=y_pred, arrays=(y_true, sample_weight)
+    )


Shouldn't get_namespace_and_device raise if they're not already on the same device ?
If so, the second line is a noop

It was like this in the past (that it raised), but since #29476 it only returns an empty list in this case.

Shouldn't get_namespace_and_device raise if they're not already on the same device ?

Ah yes, I'd change to:

xp, _, device = get_namespace_and_device(y_pred) y_true, sample_weight = _convert_to_reference( reference=y_pred, arrays=(y_true, sample_weight) )

It was like this in the past (that it raised), but since #29476 it only returns an empty list in this case.

I thought #29476 filtered out sparse arrays in _remove_non_arrays ? I think get_namespace_and_device will error with different namespace/devices

It does raise. get_namespace_and_device calls sklearn.utils._array_api.device with the list of arrays which raises when they're not all on the same device

scikit-learn/sklearn/utils/_array_api.py

Lines 198 to 209 in aa680bc

device_ = _single_array_device(array_list[0])

# Note: here we cannot simply use a Python `set` as it requires

# hashable members which is not guaranteed for Array API device

# objects. In particular, CuPy devices are not hashable at the

# time of writing.

for array in array_list[1:]:

device_other = _single_array_device(array)

if device_ != device_other:

raise ValueError(

f"Input arrays use different devices: {device_}, {device_other}"

)

lucyleeow · 2025-07-24T12:04:09Z

sklearn/utils/_array_api.py

+            try:
+                # Note will copy if required
+                array_converted = xp_ref.from_dlpack(array, device=device_ref)
+            except AttributeError:


I decided to only except AttributeError, which I think occurs if input or output namespace does not support dlpack.

from_dlpack can give 2 (or more) other errors

BufferError - The dlpack and dlpack_device methods on the input array may raise BufferError when the data cannot be exported as DLPack (e.g., incompatible dtype, strides, or device). It may also raise other errors when export fails for other reasons (e.g., not enough memory available to materialize the data). from_dlpack must propagate such exceptions.

I thought that if dlpack fails to convert due to one of the above errors, it would not make sense to try ourselves manually.

ValueError - If data exchange is possible via an explicit copy but copy is set to False.

I've left copy=None, allowing it to copy if need be, so this error is not relevant. I am not sure about the copy=None setting though, it is a lot of memory usage.

Talking to Evgeni about what could possibly cause a BufferError - here are some ideas, but but I don't know if any of these will cause an error:

numpy supports negative strides but torch does not (https://discuss.pytorch.org/t/negative-strides-in-tensor-error/134287) (I don't think torch supports DLPack atm, so hard to figure out)

numpy structured array?

numpy memmapped array

virchan

I suspect we could simplify _convert_to_reference a bit when xp_ref is NumPy:

if _is_numpy_namespace(xp_ref):
   return tuple([_convert_to_numpy(array, get_namespace(array)) for array in arrays])

However, I'm not sure this offers much benefit beyond readability, since in most cases there are only two arrays to convert: y_true and sample_weight.

I'll have to give it some more thought.

sklearn/utils/_array_api.py

lucyleeow · 2025-07-25T09:48:08Z

I suspect we could simplify _convert_to_reference a bit when xp_ref is NumPy:

At the moment that would be simpler, but I think _convert_to_numpy is considered a bit of a 'hack'. It's got numpy conversions for specific array namespaces hard coded (needs maintenance, though I do doubt any of the APIs will change) and it doesn't necessarily work for all array API arrays. I think DLPack is more future proof and it has a copy parameter (we could specify to avoid copying if we wanted to) - at least this was the thinking when I decided to just try DLPack first...

since in most cases there are only two arrays to convert: y_true and sample_weight.

For metrics I think this would mostly be it, but for estimators there could be other arrays to convert.

betatim · 2025-07-25T12:34:23Z

sklearn/utils/_array_api.py

@@ -466,6 +466,34 @@ def get_namespace_and_device(
        return xp, False, arrays_device


+def _convert_to_reference(*, reference, arrays):


This is a nit/pet peeve: sklearn.utils._array_api is a private module, we don't need to make functions private here. It feels like repeating ourselves. I've made this argument before and you can tell I've not convinced everyone because there are many functions here that start with a _ but I will keep trying :D

I also tried at some point. It's a lost battle 😄

betatim · 2025-07-25T12:39:07Z

sklearn/utils/_array_api.py

+def _convert_to_reference(*, reference, arrays):
+    """Convert `arrays` to `reference` array's namespace and device."""
+    xp_ref, _, device_ref = get_namespace_and_device(reference)
+    arrays_converted_list = []


How about calling it converted_arrays? arrays_converted_list feels backwards and we can use the plural "s" to let people know it is a sequence/collection of more than one thing

betatim · 2025-07-25T12:42:31Z

sklearn/metrics/_classification.py

+    xp, _, device = get_namespace_and_device(y_pred)
+    y_true, sample_weight = _convert_to_reference(
+        reference=y_pred, arrays=(y_true, sample_weight)
+    )


I agree that returning xp and device from convert_to_reference feels like too much. But it also feels weird to have this "duplication" of get_namespace_and_device and convert_to_reference. The only alternative I can think of, though I'm not sure I like it, is to have convert_to_reference(arrays, namespace=xp, device=device) (but then maybe rename it to move_to or some such.

betatim · 2025-07-25T12:43:56Z

sklearn/utils/_array_api.py

@@ -466,6 +466,34 @@ def get_namespace_and_device(
        return xp, False, arrays_device


+def _convert_to_reference(*, reference, arrays):


I'm not a great fan of the *. Let people use or not use keyword arguments. You can foo(bar=42) with def foo(bar):

People are grownups :D

betatim · 2025-07-25T12:46:34Z

sklearn/utils/_array_api.py

+                # Note will copy if required
+                array_converted = xp_ref.from_dlpack(array, device=device_ref)
+            except AttributeError:
+                # Convert to numpy


Suggested change

# Convert to numpy

# Converting to numpy is tricky, handle this via a dedicated function

betatim · 2025-07-25T12:47:02Z

sklearn/utils/_array_api.py

+                # Convert to numpy
+                if _is_numpy_namespace(xp_ref):
+                    array_converted = _convert_to_numpy(array, xp_array)
+                # Convert from numpy


Suggested change

# Convert from numpy

# Convert from numpy, all array libraries know how to use a Numpy array

betatim · 2025-07-25T12:48:54Z

sklearn/utils/_array_api.py

+                elif _is_numpy_namespace(xp_array):
+                    array_converted = xp_ref.asarray(array, device=device_ref)
+                else:
+                    # Convert to numpy then to reference


Suggested change

# Convert to numpy then to reference

# There is no generic way to convert from namespace A to B

# So we first convert from A to numpy and then from numpy to B

# The way to avoid this round trip is to lobby for DLpack support

# in libraries A and B

jeremiedbb · 2025-07-25T12:54:14Z

What's the difference with sklearn.utils._array_api.ensure_common_namespace_device ? Looks like both are trying to do the same thing.

betatim · 2025-07-25T15:07:25Z

What's the difference with sklearn.utils._array_api.ensure_common_namespace_device ? Looks like both are trying to do the same thing.

I think the goal of both is the same. I also think that ensure_common_namespace_device is broken right now, at least for the application that Lucy has in mind. Maybe the thing to do is to replace the code of ensure_common_namespace_device with that of the new function?

wip

573b9f4

github-actions bot added module:metrics module:utils labels Jul 24, 2025

jeremiedbb reviewed Jul 24, 2025

View reviewed changes

lucyleeow commented Jul 24, 2025

View reviewed changes

virchan reviewed Jul 25, 2025

View reviewed changes

sklearn/utils/_array_api.py Outdated Show resolved Hide resolved

fix

11e5205

review

e99209e

betatim reviewed Jul 25, 2025

View reviewed changes

	device_ = _single_array_device(array_list[0])

	# Note: here we cannot simply use a Python `set` as it requires
	# hashable members which is not guaranteed for Array API device
	# objects. In particular, CuPy devices are not hashable at the
	# time of writing.
	for array in array_list[1:]:
	device_other = _single_array_device(array)
	if device_ != device_other:
	raise ValueError(
	f"Input arrays use different devices: {device_}, {device_other}"
	)

		@@ -466,6 +466,34 @@ def get_namespace_and_device(
		return xp, False, arrays_device


		def _convert_to_reference(*, reference, arrays):

	# Convert to numpy
	# Converting to numpy is tricky, handle this via a dedicated function

	# Convert from numpy
	# Convert from numpy, all array libraries know how to use a Numpy array

-                    # Convert to numpy then to reference
+                    # There is no generic way to convert from namespace A to B
+                    # So we first convert from A to numpy and then from numpy to B
+                    # The way to avoid this round trip is to lobby for DLpack support
+                    # in libraries A and B

Uh oh!

WIP Add function to convert array namespace and device to reference array #31829

Are you sure you want to change the base?

WIP Add function to convert array namespace and device to reference array #31829

Uh oh!

Conversation

lucyleeow commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

virchan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lucyleeow commented Jul 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeremiedbb commented Jul 25, 2025

Uh oh!

betatim commented Jul 25, 2025

Uh oh!

Uh oh!

lucyleeow commented Jul 24, 2025 •

edited

Loading

github-actions bot commented Jul 24, 2025 •

edited

Loading