ENH Add pos_label parameter to TargetEncoder #31796

JawadAliAI · 2025-07-20T14:53:59Z

Add pos_label parameter to TargetEncoder for binary classification
When pos_label is specified, use LabelBinarizer instead of LabelEncoder
Allows users to specify which label should be the positive class
Add comprehensive tests for the new parameter
Maintains backward compatibility (pos_label=None uses old behavior)

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

- Add pos_label parameter to TargetEncoder for binary classification - When pos_label is specified, use LabelBinarizer instead of LabelEncoder - Allows users to specify which label should be the positive class - Add comprehensive tests for the new parameter - Maintains backward compatibility (pos_label=None uses old behavior) Fixes scikit-learn#27342

github-actions · 2025-07-20T14:54:55Z

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here

`ruff check`

ruff detected issues. Please run ruff check --fix --output-format=full locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


sklearn/preprocessing/_target_encoder.py:389:81: W291 [*] Trailing whitespace
    |
387 |                 label_binarizer = LabelBinarizer(pos_label=self.pos_label)
388 |                 y = label_binarizer.fit_transform(y)
389 |                 # For binary classification, LabelBinarizer may return 1D array, 
    |                                                                                 ^ W291
390 |                 # convert to 2D for consistency
391 |                 if y.ndim == 1:
    |
    = help: Remove trailing whitespace

sklearn/preprocessing/tests/test_target_encoder.py:733:59: W291 [*] Trailing whitespace
    |
731 |     X_trans_yes = encoder_pos_yes.transform(X)
732 |
733 |     # Test with pos_label='no' - should use LabelBinarizer  
    |                                                           ^^ W291
734 |     encoder_pos_no = TargetEncoder(target_type="binary", pos_label="no")
735 |     encoder_pos_no.fit(X, y)
    |
    = help: Remove trailing whitespace

sklearn/preprocessing/tests/test_target_encoder.py:740:82: W291 [*] Trailing whitespace
    |
738 |     # Verify classes are set correctly
739 |     assert_array_equal(encoder_default.classes_, ["no", "yes"])  # LabelEncoder sorts
740 |     assert_array_equal(encoder_pos_yes.classes_, ["no", "yes"])  # LabelBinarizer 
    |                                                                                  ^ W291
741 |     assert_array_equal(encoder_pos_no.classes_, ["no", "yes"])   # LabelBinarizer
    |
    = help: Remove trailing whitespace

sklearn/preprocessing/tests/test_target_encoder.py:761:1: W293 [*] Blank line contains whitespace
    |
759 |     # Test that different parameter types are accepted
760 |     X = np.array([["a"], ["b"]])
761 |     
    | ^^^^ W293
762 |     if pos_label == 1:
763 |         y = np.array([0, 1])
    |
    = help: Remove whitespace from blank line

sklearn/preprocessing/tests/test_target_encoder.py:765:36: W291 [*] Trailing whitespace
    |
763 |         y = np.array([0, 1])
764 |     elif pos_label == "yes":
765 |         y = np.array(["no", "yes"])  
    |                                    ^^ W291
766 |     else:  # boolean True
767 |         y = np.array([False, True])
    |
    = help: Remove trailing whitespace

sklearn/preprocessing/tests/test_target_encoder.py:768:1: W293 [*] Blank line contains whitespace
    |
766 |     else:  # boolean True
767 |         y = np.array([False, True])
768 |     
    | ^^^^ W293
769 |     encoder = TargetEncoder(target_type="binary", pos_label=pos_label)
770 |     encoder.fit(X, y)
    |
    = help: Remove whitespace from blank line

Found 6 errors.
[*] 6 fixable with the `--fix` option.

`ruff format`

ruff detected issues. Please run ruff format locally and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


--- sklearn/preprocessing/_target_encoder.py
+++ sklearn/preprocessing/_target_encoder.py
@@ -386,7 +386,7 @@
             if self.pos_label is not None:
                 label_binarizer = LabelBinarizer(pos_label=self.pos_label)
                 y = label_binarizer.fit_transform(y)
-                # For binary classification, LabelBinarizer may return 1D array, 
+                # For binary classification, LabelBinarizer may return 1D array,
                 # convert to 2D for consistency
                 if y.ndim == 1:
                     y = y.reshape(-1, 1)

--- sklearn/preprocessing/tests/test_target_encoder.py
+++ sklearn/preprocessing/tests/test_target_encoder.py
@@ -730,15 +730,15 @@
     encoder_pos_yes.fit(X, y)
     X_trans_yes = encoder_pos_yes.transform(X)
 
-    # Test with pos_label='no' - should use LabelBinarizer  
+    # Test with pos_label='no' - should use LabelBinarizer
     encoder_pos_no = TargetEncoder(target_type="binary", pos_label="no")
     encoder_pos_no.fit(X, y)
     X_trans_no = encoder_pos_no.transform(X)
 
     # Verify classes are set correctly
     assert_array_equal(encoder_default.classes_, ["no", "yes"])  # LabelEncoder sorts
-    assert_array_equal(encoder_pos_yes.classes_, ["no", "yes"])  # LabelBinarizer 
-    assert_array_equal(encoder_pos_no.classes_, ["no", "yes"])   # LabelBinarizer
+    assert_array_equal(encoder_pos_yes.classes_, ["no", "yes"])  # LabelBinarizer
+    assert_array_equal(encoder_pos_no.classes_, ["no", "yes"])  # LabelBinarizer
 
     # Test with numeric binary labels
     y_numeric = np.array([1, 0, 1, 0, 1, 0])
@@ -758,14 +758,14 @@
     """Test that TargetEncoder accepts different types for pos_label."""
     # Test that different parameter types are accepted
     X = np.array([["a"], ["b"]])
-    
+
     if pos_label == 1:
         y = np.array([0, 1])
     elif pos_label == "yes":
-        y = np.array(["no", "yes"])  
+        y = np.array(["no", "yes"])
     else:  # boolean True
         y = np.array([False, True])
-    
+
     encoder = TargetEncoder(target_type="binary", pos_label=pos_label)
     encoder.fit(X, y)
     # Should not raise any errors

2 files would be reformatted, 924 files already formatted

_{Generated for commit: 1f5dec2. Link to the linter CI: here}

Abhijais4896

def test_target_encoder_pos_label():
"""Test TargetEncoder pos_label parameter for binary classification."""
# Create simple binary classification data
X = np.array([["cat"], ["dog"], ["cat"], ["dog"], ["cat"], ["dog"]])
y = np.array(["yes", "no", "yes", "no", "yes", "no"])

# Test default behavior (pos_label=None) - should use LabelEncoder
encoder_default = TargetEncoder(target_type="binary", pos_label=None)
encoder_default.fit(X, y)
X_trans_default = encoder_default.transform(X)

github-actions bot added the module:preprocessing label Jul 20, 2025

Abhijais4896 reviewed Jul 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH Add pos_label parameter to TargetEncoder #31796

ENH Add pos_label parameter to TargetEncoder #31796

JawadAliAI commented Jul 20, 2025

Uh oh!

github-actions bot commented Jul 20, 2025

Uh oh!

Abhijais4896 left a comment

Uh oh!

Uh oh!

Uh oh!

ENH Add pos_label parameter to TargetEncoder #31796

Are you sure you want to change the base?

ENH Add pos_label parameter to TargetEncoder #31796

Conversation

JawadAliAI commented Jul 20, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Jul 20, 2025

❌ Linting issues

ruff check

ruff format

Uh oh!

Abhijais4896 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

`ruff check`

`ruff format`