-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
MNT/DOC Raise for **fit_params
in fit_transform
if not expected by fit
#31830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
MNT/DOC Raise for **fit_params
in fit_transform
if not expected by fit
#31830
Conversation
sklearn/base.py
Outdated
for key, _ in fit_params.items(): | ||
if ( | ||
key not in inspect.signature(self.fit).parameters | ||
and "fit_params" not in inspect.signature(self.fit).parameters | ||
): | ||
message = ( | ||
f"`{self.__class__.__name__}.fit` got an unexpected keyword " | ||
f"argument `{key}`. Only pass {key} into `fit_transform` if `fit` " | ||
"expects them or can handle `**fit_params`." | ||
) | ||
raise TypeError(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of doing this manually, you can rely on python's machinery, and put a try/except around call to fit
and catch TypeError
, and then do a raise ... from e
with a TypeError
and a nicer error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure about this PR. It improves the error message but doesn't fix the root of the problem which is a doc issue. And I'm not even convinced that there's a doc issue.
The doc says **fit_params
are parameters passed to fit. We have several choices on the way we pass them
-
- pass them directly and
fit
raises if they're not in the signature.
- pass them directly and
-
- check the signature of
fit
before passing them. Then pass only the ones that are in the signature and warn that the other are ignored.
- check the signature of
-
- check the signature of
fit
before passing them. Then pass only the ones that are in the signature silently.
- check the signature of
So far in scikit-learn we do the first one. And it's not at all present only with fit_transform
, it already happens with fit
on meta_estimators. Take for example
from sklearn.datasets import make_regression
from sklearn.compose import TransformedTargetRegressor
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
X, y = make_regression()
TransformedTargetRegressor(regressor=Ridge()).fit(X, y, not_a_fit_param=1)
TypeError: Ridge.fit() got an unexpected keyword argument 'not_a_fit_param'
I don't find that surprising and if I have a doubt I got check the signature of Ridge.fit
This PR adds a 1b) option that is raise but check the signature to improve the error message. It don't think it's necessary. However if we chose to do this (which should first be done in a RFC issue imo), I think it should be done consistently across the code base.
sklearn/base.py
Outdated
if y is None: | ||
# fit method of arity 1 (unsupervised transformation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it only done in the y is None
branch ? should be in both.
from sklearn.preprocessing import Normalizer
X = [[4, 1, 2, 2],
[1, 3, 9, 3],
[5, 7, 5, 1]]
Normalizer().fit_transform(X, y=[1,2,3], ]sample_weight=np.array([[1,1,1]]))
TypeError: Normalizer.fit() got an unexpected keyword argument 'sample_weight'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh true. I push a fix for now. This is irrespective of whether we want ti keep this PR or not.
Thank you, @jeremiedbb. I feel that I didn't state well the intention and scope of this PR, that is supposed to be much smaller. The issue that I had tried to solve was actually only a documentation issue, that was newly introduced with metadata routing and only in transformers that do inherit from For instance the docs for Normalizer display
The PR tries to even out the fact that we display wrong documentation, by giving the users a service by re-raising with a clearer error message. I don't think that's necessary in general, because the Python I agree it's not an ideal solution. Another way of dealing with this would be to remove the metadata routing logic from Or, the added line to the docs could be enough. |
The doc is not technically wrong.
It's really not that different from the example I showed. Yes here there are no meta estimators in between, but the idea is the same. We pass additional params to
If you really don't want to see |
That's true. Let's only keep the documentation change, if that's alright.
I think it might not be worth it for now. |
For transformers that inherit from
TransformerMixin
, the docs always displayfit_params
as passable intofit_transform
even if theirfit
method doesn't accept anything other thanX
(andy=None
).That is for instance the case in
Normalizer
,LinearDiscriminantAnalysis
andFeatureHasher
:A python
TypeError
is raised in this case:TypeError: Normalizer.fit() got an unexpected keyword argument 'sample_weight'
This PR documents this on the method and adds a more intuitive error message that scikit-learn can raise directly.
This doesn't prevent
fit_params
to be displayed in wrong places in the docs, but deals with the problems that users can encounter in such a case.Another solution to this problem that would remove
fit_params
from the docs in wrong places would be to over-write more of thefit_transform
methods in the classes of routers (as it is already the case in many places), as for instance inStacking*
andRFE
that still use fit_transform from TransformerMixin, and thus totally de-coubple the metadata routing-logic fromTransformerMixin
'sfit_transform
. Or, alternatively, add an additionalRoutingTransformerMixin
for the routing logic.But both breakages of inheritance structure seem a bit too much for handling this little thing.
CC: @adrinjalali