ENH: Add Array API support to hamming_loss #30838

lithomas1 · 2025-02-15T16:31:06Z

Reference Issues/PRs

xref #26024

What does this implement/fix? Explain your changes.

This makes hamming_loss array API compatible.

Any other comments?

github-actions · 2025-02-15T16:32:17Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 595b2bd. Link to the linter CI: here}

sklearn/metrics/_classification.py

virchan

Thanks for the PR @lithomas1!

Appreciate your comment about about the weight_average!

sklearn/metrics/_classification.py

virchan

CUDA CI passes on GitHub Actions, but I kept encountering the following DeprecationWarning when testing on my local machine:

DeprecationWarning: __array__ implementation doesn't accept a copy keyword, so passing copy=False failed. __array__ must implement 'dtype' and 'copy' keyword arguments. To learn more, see the migration guide https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword

The warning seems to originate from the count_zero function while testing PyTorch on CPU:

sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multilabel_classification_metric-torch-cpu-float64] FAILED                                           
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multilabel_classification_metric-torch-cpu-float32] FAILED

It could be a false positive on my end, but I also noticed that casting sample_weight back to the xp array type:

if sample_weight is not None:
   sample_weight = xp.asarray(sample_weight, device=device)
   weight average = _average(sample_weight, xp=xp)

would suppress the warning message.

This "fix" makes sense to me because moving sample_weight to the same namespace as y_true and y_pred ensures consistency across backends and prevents NumPy array methods from being called on a PyTorch tensor.

However, I’m not entirely sure if this is the correct fix or if it’s simply avoiding the warning. Therefore, I’d like to ping @OmarManzoor for help to take a closer look. Also, I don’t have the hardware to test for MPS, so @OmarManzoor, SOS! 😅

OmarManzoor · 2025-02-25T10:01:43Z

It could be a false positive on my end, but I also noticed that casting sample_weight back to the xp array type:
if sample_weight is not None:
   sample_weight = xp.asarray(sample_weight, device=device)
   weight average = _average(sample_weight, xp=xp)
would suppress the warning message.

This "fix" makes sense to me because moving sample_weight to the same namespace as y_true and y_pred ensures consistency across backends and prevents NumPy array methods from being called on a PyTorch tensor.

I think this makes sense as sample_weight should be on the same namespace and device, in the case where it isn't already. So this additional code to ensure that seems correct. @virchan feel free to add that as a suggestion.

@ogrisel Would it be possible for you to run this on a Mac with mps?

virchan

To be safe, let's ensure that sample_weight is in the same namespace and on the same device as y_true and y_pred.

sklearn/metrics/_classification.py

Co-authored-by: Virgil Chan <virchan.math@gmail.com>

OmarManzoor

Thanks for the PR @lithomas1

sklearn/metrics/_classification.py

Co-authored-by: Omar Salman <omar.salman2007@gmail.com>

lithomas1 · 2025-02-26T12:30:40Z

Thanks for the review.

r.e. the count_nonzero change

I think I copied the original code from here (from accuracy_score)

scikit-learn/sklearn/metrics/_classification.py

Lines 232 to 238 in 5eb676a

    
           if _is_numpy_namespace(xp): 
        
               differing_labels = count_nonzero(y_true - y_pred, axis=1) 
        
           else: 
        
               differing_labels = _count_nonzero( 
        
                   y_true - y_pred, xp=xp, device=device, axis=1 
        
               ) 
        
           score = xp.asarray(differing_labels == 0, device=device)

.

Do you think it's fine for me to update that section here, or should I open a followup?

OmarManzoor · 2025-02-26T12:56:56Z

Do you think it's fine for me to update that section here, or should I open a followup?

Yes sure, I think it would be nice to refactor that as well considering this applies there. We can run the CUDA CI after you push your change to ensure everything is working as expected.

OmarManzoor

LGTM. Thank you @lithomas1

virchan

LGTM! Thanks @lithomas1!

OmarManzoor · 2025-03-02T08:04:09Z

@ogrisel @adrinjalali Would it be possible for you run this on a Mac with mps so that we can merge this PR?

adrinjalali · 2025-03-07T08:24:53Z

Unfortunately I don't have a mac machine. Maybe @adam2392 does?

adam2392 · 2025-03-07T14:36:51Z

Sure I can try running this tonight. I have a M1

lithomas1 · 2025-03-16T20:48:22Z

@adam2392
Were you able to run this on MPS?

adam2392 · 2025-03-16T23:42:44Z

Sorry trying to get this to work on my Mac M1, but for some reason when I run:

    xp, _, device = get_namespace_and_device(y_true, y_pred, sample_weight)
print(device)
> None

which is weird to me. I've installed scikit-learn on miniforge too

adam2392 · 2025-03-16T23:54:20Z

So I think this function works for MPS, but I am not that familiar with the testing, and I cannot get the MPS device to get picked up. So I can't say forsure… I'm running the following to test and then adding a breakpoint to check if the device is using mps:0, which it is not.

PYTORCH_ENABLE_MPS_FALLBACK=1 pytest ./sklearn/metrics/tests/test_common.py::test_array_api_compliance

However, if I run this file manually, the output seems correct. Mind educating me @OmarManzoor @adrinjalali ?

import torch
import numpy as np
from sklearn import config_context
from sklearn.metrics import hamming_loss
from sklearn.datasets import make_multilabel_classification

# Check if MPS is available
if torch.backends.mps.is_available():
    device = torch.device("mps")
    print("MPS backend is available.")
else:
    device = torch.device("cpu")
    print("MPS backend is not available. Running on CPU.")

# Generate sample multi-label classification data
X, Y = make_multilabel_classification(n_samples=100, n_classes=5, random_state=42)

# Convert to PyTorch tensors and move to MPS device
Y_true = torch.tensor(Y, dtype=torch.float32).to(device)
Y_pred = (torch.rand_like(Y_true) > 0.5).float().to(device)  # Random binary predictions

try:
    # Compute Hamming loss
    with config_context(array_api_dispatch=True):
        loss = hamming_loss(Y_true, Y_pred)
    print(f"Hamming loss: {loss:.4f}")
except Exception as e:
    print(f"Error while computing Hamming loss: {e}")

> MPS backend is available.
> Hamming loss: 0.5000

…ing-loss-array-api

…learn into hamming-loss-array-api

lithomas1 · 2025-03-17T17:22:33Z

Thanks for giving this a run.
I tried checking on my Macbook Pro (note: I have a pre-M1 Mac), and the test seems to pass

sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_binary_classification_metric-numpy-None-None] PASSED                                                 [ 34%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_binary_classification_metric-array_api_strict-None-None] PASSED                                      [ 34%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_binary_classification_metric-cupy-None-None] SKIPPED (cupy is not installed: not checking array_...) [ 34%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_binary_classification_metric-torch-cpu-float64] PASSED                                               [ 34%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_binary_classification_metric-torch-cpu-float32] PASSED                                               [ 34%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_binary_classification_metric-torch-cuda-float64] SKIPPED (PyTorch test requires cuda, which is n...) [ 35%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_binary_classification_metric-torch-cuda-float32] SKIPPED (PyTorch test requires cuda, which is n...) [ 35%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_binary_classification_metric-torch-mps-float32] PASSED                                               [ 35%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multiclass_classification_metric-numpy-None-None] PASSED                                             [ 35%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multiclass_classification_metric-array_api_strict-None-None] PASSED                                  [ 36%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multiclass_classification_metric-cupy-None-None] SKIPPED (cupy is not installed: not checking ar...) [ 36%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multiclass_classification_metric-torch-cpu-float64] PASSED                                           [ 36%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multiclass_classification_metric-torch-cpu-float32] PASSED                                           [ 36%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multiclass_classification_metric-torch-cuda-float64] SKIPPED (PyTorch test requires cuda, which ...) [ 36%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multiclass_classification_metric-torch-cuda-float32] SKIPPED (PyTorch test requires cuda, which ...) [ 37%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multiclass_classification_metric-torch-mps-float32] PASSED                                           [ 37%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multilabel_classification_metric-numpy-None-None] PASSED                                             [ 37%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multilabel_classification_metric-array_api_strict-None-None] PASSED                                  [ 37%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multilabel_classification_metric-cupy-None-None] SKIPPED (cupy is not installed: not checking ar...) [ 37%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multilabel_classification_metric-torch-cpu-float64] PASSED                                           [ 38%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multilabel_classification_metric-torch-cpu-float32] PASSED                                           [ 38%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multilabel_classification_metric-torch-cuda-float64] SKIPPED (PyTorch test requires cuda, which ...) [ 38%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multilabel_classification_metric-torch-cuda-float32] SKIPPED (PyTorch test requires cuda, which ...) [ 38%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[hamming_loss-check_array_api_multilabel_classification_metric-torch-mps-float32] PASSED

The command I ran was

PYTORCH_ENABLE_MPS_FALLBACK=1 SCIPY_ARRAY_API=1 pytest sklearn/metrics/tests/test_common.py -k "array_api" -v

I'm also on an older torch version (2.2), which might also make a difference.

adam2392 · 2025-03-17T17:54:46Z

I'm curious: Will the tests pass even if MPS is not used (and CPU is used), or does passing imply MPS was used correctly? Since you're not on a M1+ Mac, it seems the tests are passing even tho it's not testing MPS which seems like a bug to me.

I think the code works based on my snippet showing the hamming loss is computed using a MPS tensor from PyTorch. I'm just not sure if the tests are working as intended tho.

lithomas1 · 2025-03-17T22:09:25Z

I think MPS was used correctly (the Intel Mac I have has a discrete AMD GPU that has some MPS capabilities - I don't know if it is falling back for the hamming loss stuff though).

While running the metrics tests, I also did see GPU usage and some Metal processes (MTLCompilerService) when running on MPS, so at least something in there is using my GPU.

The device thing is very strange, though.

virchan · 2025-03-17T22:42:53Z

I'm curious: Will the tests pass even if MPS is not used (and CPU is used), or does passing imply MPS was used correctly? Since you're not on a M1+ Mac, it seems the tests are passing even tho it's not testing MPS which seems like a bug to me.

I think the code works based on my snippet showing the hamming loss is computed using a MPS tensor from PyTorch. I'm just not sure if the tests are working as intended tho.

virchan · 2025-03-17T22:57:30Z

I'm curious: Will the tests pass even if MPS is not used (and CPU is used), or does passing imply MPS was used correctly?

If MPS is not used, pytest will skip the associated tests with one of the following messages:

Skipping MPS device test because PYTORCH_ENABLE_MPS_FALLBACK is not set.

or

MPS is not available because the current PyTorch install was not built with MPS enabled.

This is typically what I got when running the Array API tests on my local Windows machine.

virchan · 2025-03-17T23:10:12Z

xp, _, device = get_namespace_and_device(y_true, y_pred, sample_weight)
print(device)
None

This might be related to #30454, as we changed the _single_array_device function to return device as None instead of cpu when array_api_dispatch=False.

lithomas1 · 2025-03-18T00:19:12Z

xp, _, device = get_namespace_and_device(y_true, y_pred, sample_weight)
print(device)
None

What does xp show as here?
(I wonder if this is maybe because this wasn't ran in a config_context(array_api_dispatch=True) block?)

I initially got back None after forgetting to wrap stuff in config_context, but after wrapping with config_context I get

>>> from sklearn import config_context
>>> with config_context(array_api_dispatch=True):
...      get_namespace_and_device(torch.tensor([1,2,3]).to("mps:0"))
... 
(<module 'array_api_compat.torch' from '/Users/thomasli/opt/miniconda3/envs/scikit-learn/lib/python3.12/site-packages/array_api_compat/torch/__init__.py'>, True, device(type='mps', index=0))

adam2392 · 2025-03-18T01:21:47Z

Ah okay it was my fault. The metric call in the unit-test is always called once w/ numpy array and once with the relevant pytorch tensor

scikit-learn/sklearn/metrics/tests/test_common.py

Line 1822 in 774316c

metric_np = metric(a_np, b_np, **metric_kwargs)
scikit-learn/sklearn/metrics/tests/test_common.py

Line 1868 in 774316c

metric_xp = metric(a_xp, b_xp, **metric_kwargs)

The second time indeed catches the mps:0 as expected, so this LGTM. Sorry for the confusion. But TIL :)

adam2392

Thanks for the PR @lithomas1!

lithomas1 · 2025-03-18T16:09:34Z

Thanks for the reviews!

ENH: Add Array API support to hamming_loss

9d9b622

github-actions bot added the module:metrics label Feb 15, 2025

lithomas1 added 2 commits February 15, 2025 11:33

add whatsnew

ccd0ade

fixes

6157383

lithomas1 commented Feb 15, 2025

View reviewed changes

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

lithomas1 marked this pull request as ready for review February 15, 2025 18:24

virchan added the Array API label Feb 19, 2025

virchan reviewed Feb 19, 2025

View reviewed changes

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

lithomas1 added 2 commits February 22, 2025 08:56

simplify from code review

264ef53

Merge branch 'main' into hamming-loss-array-api

bdfffdd

lithomas1 requested a review from virchan February 22, 2025 13:57

virchan added the CUDA CI label Feb 25, 2025

github-actions bot removed the CUDA CI label Feb 25, 2025

virchan reviewed Feb 25, 2025

View reviewed changes

sklearn/metrics/_classification.py Show resolved Hide resolved

Update sklearn/metrics/_classification.py

035d650

Co-authored-by: Virgil Chan <virchan.math@gmail.com>

OmarManzoor reviewed Feb 26, 2025

View reviewed changes

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

Update sklearn/metrics/_classification.py

21fbe2b

Co-authored-by: Omar Salman <omar.salman2007@gmail.com>

lithomas1 added 3 commits March 1, 2025 21:10

cleanup

c9fb17c

lint

97fdc1e

more lint

5c13dac

OmarManzoor added the CUDA CI label Mar 2, 2025

github-actions bot removed the CUDA CI label Mar 2, 2025

OmarManzoor approved these changes Mar 2, 2025

View reviewed changes

virchan approved these changes Mar 2, 2025

View reviewed changes

lithomas1 added 2 commits March 17, 2025 13:19

Merge branch 'main' of github.com:scikit-learn/scikit-learn into hamm…

ae071f1

…ing-loss-array-api

Merge branch 'hamming-loss-array-api' of github.com:lithomas1/scikit-…

595b2bd

…learn into hamming-loss-array-api

virchan closed this Mar 17, 2025

virchan reopened this Mar 17, 2025

adam2392 approved these changes Mar 18, 2025

View reviewed changes

adam2392 merged commit 8f167d2 into scikit-learn:main Mar 18, 2025
45 checks passed

adam2392 mentioned this pull request Mar 18, 2025

Make more of the "tools" of scikit-learn Array API compatible #26024

Open

lithomas1 deleted the hamming-loss-array-api branch March 18, 2025 02:02

virchan mentioned this pull request Jul 23, 2025

MNT Add _check_sample_weights to classification metrics #31701

Merged

Uh oh!

ENH: Add Array API support to hamming_loss #30838

ENH: Add Array API support to hamming_loss #30838

Uh oh!

Conversation

lithomas1 commented Feb 15, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Feb 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

Uh oh!

virchan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

virchan left a comment

Choose a reason for hiding this comment

Uh oh!

OmarManzoor commented Feb 25, 2025

Uh oh!

virchan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lithomas1 commented Feb 26, 2025

Uh oh!

OmarManzoor commented Feb 26, 2025

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

virchan left a comment

Choose a reason for hiding this comment

Uh oh!

OmarManzoor commented Mar 2, 2025

Uh oh!

adrinjalali commented Mar 7, 2025

Uh oh!

adam2392 commented Mar 7, 2025

Uh oh!

lithomas1 commented Mar 16, 2025

Uh oh!

adam2392 commented Mar 16, 2025

Uh oh!

adam2392 commented Mar 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lithomas1 commented Mar 17, 2025

Uh oh!

adam2392 commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lithomas1 commented Mar 17, 2025

Uh oh!

virchan commented Mar 17, 2025

Uh oh!

virchan commented Mar 17, 2025

Uh oh!

virchan commented Mar 17, 2025

Uh oh!

lithomas1 commented Mar 18, 2025

Uh oh!

adam2392 commented Mar 18, 2025

Uh oh!

adam2392 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lithomas1 commented Mar 18, 2025

Uh oh!

Uh oh!

github-actions bot commented Feb 15, 2025 •

edited

Loading

adam2392 commented Mar 16, 2025 •

edited

Loading

adam2392 commented Mar 17, 2025 •

edited

Loading