Skip to content

ENH Add pos_label parameter to TargetEncoder #31796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JawadAliAI
Copy link

  • Add pos_label parameter to TargetEncoder for binary classification
  • When pos_label is specified, use LabelBinarizer instead of LabelEncoder
  • Allows users to specify which label should be the positive class
  • Add comprehensive tests for the new parameter
  • Maintains backward compatibility (pos_label=None uses old behavior)

Fixes #27342

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

- Add pos_label parameter to TargetEncoder for binary classification
- When pos_label is specified, use LabelBinarizer instead of LabelEncoder
- Allows users to specify which label should be the positive class
- Add comprehensive tests for the new parameter
- Maintains backward compatibility (pos_label=None uses old behavior)

Fixes scikit-learn#27342
Copy link

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


ruff check

ruff detected issues. Please run ruff check --fix --output-format=full locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


sklearn/preprocessing/_target_encoder.py:389:81: W291 [*] Trailing whitespace
    |
387 |                 label_binarizer = LabelBinarizer(pos_label=self.pos_label)
388 |                 y = label_binarizer.fit_transform(y)
389 |                 # For binary classification, LabelBinarizer may return 1D array, 
    |                                                                                 ^ W291
390 |                 # convert to 2D for consistency
391 |                 if y.ndim == 1:
    |
    = help: Remove trailing whitespace

sklearn/preprocessing/tests/test_target_encoder.py:733:59: W291 [*] Trailing whitespace
    |
731 |     X_trans_yes = encoder_pos_yes.transform(X)
732 |
733 |     # Test with pos_label='no' - should use LabelBinarizer  
    |                                                           ^^ W291
734 |     encoder_pos_no = TargetEncoder(target_type="binary", pos_label="no")
735 |     encoder_pos_no.fit(X, y)
    |
    = help: Remove trailing whitespace

sklearn/preprocessing/tests/test_target_encoder.py:740:82: W291 [*] Trailing whitespace
    |
738 |     # Verify classes are set correctly
739 |     assert_array_equal(encoder_default.classes_, ["no", "yes"])  # LabelEncoder sorts
740 |     assert_array_equal(encoder_pos_yes.classes_, ["no", "yes"])  # LabelBinarizer 
    |                                                                                  ^ W291
741 |     assert_array_equal(encoder_pos_no.classes_, ["no", "yes"])   # LabelBinarizer
    |
    = help: Remove trailing whitespace

sklearn/preprocessing/tests/test_target_encoder.py:761:1: W293 [*] Blank line contains whitespace
    |
759 |     # Test that different parameter types are accepted
760 |     X = np.array([["a"], ["b"]])
761 |     
    | ^^^^ W293
762 |     if pos_label == 1:
763 |         y = np.array([0, 1])
    |
    = help: Remove whitespace from blank line

sklearn/preprocessing/tests/test_target_encoder.py:765:36: W291 [*] Trailing whitespace
    |
763 |         y = np.array([0, 1])
764 |     elif pos_label == "yes":
765 |         y = np.array(["no", "yes"])  
    |                                    ^^ W291
766 |     else:  # boolean True
767 |         y = np.array([False, True])
    |
    = help: Remove trailing whitespace

sklearn/preprocessing/tests/test_target_encoder.py:768:1: W293 [*] Blank line contains whitespace
    |
766 |     else:  # boolean True
767 |         y = np.array([False, True])
768 |     
    | ^^^^ W293
769 |     encoder = TargetEncoder(target_type="binary", pos_label=pos_label)
770 |     encoder.fit(X, y)
    |
    = help: Remove whitespace from blank line

Found 6 errors.
[*] 6 fixable with the `--fix` option.

ruff format

ruff detected issues. Please run ruff format locally and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


--- sklearn/preprocessing/_target_encoder.py
+++ sklearn/preprocessing/_target_encoder.py
@@ -386,7 +386,7 @@
             if self.pos_label is not None:
                 label_binarizer = LabelBinarizer(pos_label=self.pos_label)
                 y = label_binarizer.fit_transform(y)
-                # For binary classification, LabelBinarizer may return 1D array, 
+                # For binary classification, LabelBinarizer may return 1D array,
                 # convert to 2D for consistency
                 if y.ndim == 1:
                     y = y.reshape(-1, 1)

--- sklearn/preprocessing/tests/test_target_encoder.py
+++ sklearn/preprocessing/tests/test_target_encoder.py
@@ -730,15 +730,15 @@
     encoder_pos_yes.fit(X, y)
     X_trans_yes = encoder_pos_yes.transform(X)
 
-    # Test with pos_label='no' - should use LabelBinarizer  
+    # Test with pos_label='no' - should use LabelBinarizer
     encoder_pos_no = TargetEncoder(target_type="binary", pos_label="no")
     encoder_pos_no.fit(X, y)
     X_trans_no = encoder_pos_no.transform(X)
 
     # Verify classes are set correctly
     assert_array_equal(encoder_default.classes_, ["no", "yes"])  # LabelEncoder sorts
-    assert_array_equal(encoder_pos_yes.classes_, ["no", "yes"])  # LabelBinarizer 
-    assert_array_equal(encoder_pos_no.classes_, ["no", "yes"])   # LabelBinarizer
+    assert_array_equal(encoder_pos_yes.classes_, ["no", "yes"])  # LabelBinarizer
+    assert_array_equal(encoder_pos_no.classes_, ["no", "yes"])  # LabelBinarizer
 
     # Test with numeric binary labels
     y_numeric = np.array([1, 0, 1, 0, 1, 0])
@@ -758,14 +758,14 @@
     """Test that TargetEncoder accepts different types for pos_label."""
     # Test that different parameter types are accepted
     X = np.array([["a"], ["b"]])
-    
+
     if pos_label == 1:
         y = np.array([0, 1])
     elif pos_label == "yes":
-        y = np.array(["no", "yes"])  
+        y = np.array(["no", "yes"])
     else:  # boolean True
         y = np.array([False, True])
-    
+
     encoder = TargetEncoder(target_type="binary", pos_label=pos_label)
     encoder.fit(X, y)
     # Should not raise any errors

2 files would be reformatted, 924 files already formatted

Generated for commit: 1f5dec2. Link to the linter CI: here

Copy link

@Abhijais4896 Abhijais4896 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def test_target_encoder_pos_label():
"""Test TargetEncoder pos_label parameter for binary classification."""
# Create simple binary classification data
X = np.array([["cat"], ["dog"], ["cat"], ["dog"], ["cat"], ["dog"]])
y = np.array(["yes", "no", "yes", "no", "yes", "no"])

# Test default behavior (pos_label=None) - should use LabelEncoder
encoder_default = TargetEncoder(target_type="binary", pos_label=None)
encoder_default.fit(X, y)
X_trans_default = encoder_default.transform(X)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH Add pos_label parameter to TargetEncoder
2 participants