Skip to content

FEA add temperature scaling to CalibratedClassifierCV #31068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 91 commits into
base: main
Choose a base branch
from

Conversation

virchan
Copy link
Member

@virchan virchan commented Mar 25, 2025

Reference Issues/PRs

Closes #28574

What does this implement/fix? Explain your changes.

This PR adds temperature scaling to scikit-learn's CalibratedClassifierCV:

Temperature scaling can be enabled by setting method = "temperature" in CalibratedClassifierCV:

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.calibration import CalibratedClassifierCV
from sklearn.svm import LinearSVC

X, y = make_classification(random_state=42)

X_train, X_calib, y_train, y_calib = train_test_split(X, y, random_state=42)

clf = LinearSVC(random_state=42)
clf.fit(X_train, y_train)
cal_clf = CalibratedClassifierCV(clf, method="temperature").fit(X_train, y_train)

This method supports both binary and multi-class classification.

Any other comments?

Cc @adrinjalali, @lorentzenchr in advance.

Copy link

github-actions bot commented Mar 25, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 7312b00. Link to the linter CI: here

Copy link
Member Author

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A follow-up to my comment on the Array API: I don't think we can support the Array API here, as scipy.optimize.minimize does not appear to support it.

If I missed anything, please let me know—I'd be happy to investigate further.

@virchan virchan marked this pull request as ready for review March 25, 2025 10:55
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Here is a first pass of feedback:

virchan added 4 commits March 27, 2025 18:14
…fier`.

Updated constructor of `_TemperatureScaling` class.
Updated `test_temperature_scaling` in `test_calibration.py`.
Added `__sklearn_tags__` to `_TemperatureScaling` class.
Copy link
Member Author

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still working on addressing the feedback, but I also wanted to share some findings related to it and provide an update.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I few computational things seem off.

virchan added 2 commits April 25, 2025 22:16
Update `minimize` in `_temperture_scaling` to `minimize.scalar`.
Update `test_calibration.py` to check the optimised inverse temperature is between 0.1 and 10.
Copy link
Member Author

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some CI failures—I'll fix those shortly.

Also considering adding a verbose parameter to CalibratedClassifierCV to optionally display convergence info when optimising the inverse temperature beta.

Copy link
Member Author

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI fails when checking that the ROC AUCs are equal up to 7 decimal places. I'll fix it later.

Copy link
Member Author

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI passed!

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close to the finish line.

Copy link
Member Author

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the user guide and the docstrings in calibration.py. I also modified the test to check that the temperature parameter is close to 1 when the temperature scaler is fitted on the training set of the LogisticRegression classifier.

There are still some comments that need to be addressed, and I'll work on them later.

Copy link
Member Author

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored the part for checking reponse_method_name:

if len(classes) == 2 and predictions.shape[-1] == 1:
    response_method_name = _check_response_method(
        clf,
        ["decision_function", "predict_proba"],
    ).__name__
    if response_method_name == "predict_proba":
        predictions = np.hstack([1 - predictions, predictions])

I think this only needs to be applied in two places: : _fit_calibrator and _CalibratedClassifier.predict_proba. But please let me know if there's a better way to handle this.

I've also moved _temperature_scaling inside _TemperatureScaling.fit.

CI has passed, so it's ready for review!

Comment on lines +1085 to +1119
def _temperature_scaling(predictions, labels, sample_weight=None):
"""Calibrate the temperature of temperature scaling.

Parameters
----------
predictions : ndarray of shape (n_samples,) or (n_samples, n_classes)
The output of `decision_function` or `predict_proba`. If the input
appears to be probabilities (i.e., values between 0 and 1 that sum to 1
across classes), it will be converted to logits using `np.log(p + eps)`.

Binary decision function outputs (1D) will be converted to two-class
logits of the form (-x, x). For shapes of the form (n_samples, 1), the
same process applies.

labels : ndarray of shape (n_samples,)
True labels for the samples.

sample_weight : array-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted.

Returns
-------
beta : float
The optimised inverse temperature parameter for probability calibration,
with a value in the range (0, infinity).

References
----------
On Calibration of Modern Neural Networks,
C. Guo, G. Pleiss, Y. Sun, & K. Q. Weinberger, ICML 2017.
"""
check_consistent_length(predictions, labels)
logits = _convert_to_logits(
predictions
) # guarantees np.float64 or np.float32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _temperature_scaling(predictions, labels, sample_weight=None):
"""Calibrate the temperature of temperature scaling.
Parameters
----------
predictions : ndarray of shape (n_samples,) or (n_samples, n_classes)
The output of `decision_function` or `predict_proba`. If the input
appears to be probabilities (i.e., values between 0 and 1 that sum to 1
across classes), it will be converted to logits using `np.log(p + eps)`.
Binary decision function outputs (1D) will be converted to two-class
logits of the form (-x, x). For shapes of the form (n_samples, 1), the
same process applies.
labels : ndarray of shape (n_samples,)
True labels for the samples.
sample_weight : array-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted.
Returns
-------
beta : float
The optimised inverse temperature parameter for probability calibration,
with a value in the range (0, infinity).
References
----------
On Calibration of Modern Neural Networks,
C. Guo, G. Pleiss, Y. Sun, & K. Q. Weinberger, ICML 2017.
"""
check_consistent_length(predictions, labels)
logits = _convert_to_logits(
predictions
) # guarantees np.float64 or np.float32
X, y = indexable(X, y) # Is this really needed?
predictions, labels = X, y
check_consistent_length(predictions, labels)
logits = _convert_to_logits(
predictions
) # guarantees np.float64 or np.float32

and so on.
So remove the function _temperature_scaling altogether and integrate it in fit.

It will end with

self.beta = np.exp(log_beta_minimizer.x)
return self

Comment on lines +724 to +730
if len(classes) == 2 and predictions.shape[-1] == 1:
response_method_name = _check_response_method(
clf,
["decision_function", "predict_proba"],
).__name__
if response_method_name == "predict_proba":
predictions = np.hstack([1 - predictions, predictions])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put it inside _TemperatureScaling.fit?

Comment on lines +822 to +828
if n_classes == 2 and predictions.shape[-1] == 1:
response_method_name = _check_response_method(
self.estimator,
["decision_function", "predict_proba"],
).__name__
if response_method_name == "predict_proba":
predictions = np.hstack([1 - predictions, predictions])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put it into _TemperatureScaling.predict?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement temperature scaling for (multi-class) calibration
5 participants