Skip to content

MNT refactoring in routing _MetadataRequester #31534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

adrinjalali
Copy link
Member

The goal of this refactoring is to have the actual instance as the owner in MetadataRequest object, which is needed for the work in visualising the routing (PR coming).

As a consequence, the repr of the owners is used now in error messages instead, so the tests are fixed.

Copy link

github-actions bot commented Jun 12, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 0880b95. Link to the linter CI: here

Copy link
Member Author

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 149 to 150
def __repr__(self):
return "..."
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since repr is now used in error messages, putting ... here for simplicity. These lists are otherwise pretty ugly and large to print.

Comment on lines -485 to -488
with pytest.raises(
AttributeError, match="'MetadataRequest' object has no attribute 'other_method'"
):
TestDefaultsBadMethodName().get_metadata_routing()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking whether __metadata_request__method_name actually has a valid method_name doesn't exist anymore now. I don't think it's crucial to have. As a result, the test for the check is also removed.

@@ -1385,28 +1385,74 @@ def __init_subclass__(cls, **kwargs):
.. [1] https://www.python.org/dev/peps/pep-0487
"""
try:
requests = cls._get_default_requests()
for method in SIMPLE_METHODS:
method_metadata_args = cls._get_metadata_args_and_aliases(method)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of building the whole default request object, we now only check for parameters which are to be included here in __init_subclass__, which is what we actually need. The actual MetadataRequest object is now only created when needed.

@adrinjalali adrinjalali added No Changelog Needed Metadata Routing all issues related to metadata routing, slep006, sample props labels Jun 12, 2025
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-authored-by: Omar Salman <omar.salman2007@gmail.com>
Copy link
Contributor

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for walking me through these changes, @adrinjalali!

I have now looked it all through and have some comments that I hope will help with internal documentation of the new functionality.

My thought on design is: Could the value for owner become a string early on when owner=self gets passed to the MetadataRequest and MethodMetadataRequest object? We could check right there if we deal with a scorer and in this case manually create a string that serves the purpose of the visualisation in #31535. This would be simpler, I believe.

This PR is adding a lot of complexity to allow owner to be either a string or an instance in order to access the instances' _routing_repr and if there is another way, than this would be pretty cool.

super().__init_subclass__(**kwargs)

@classmethod
def _build_request_for_signature(cls, router, method):
def _get_metadata_args_and_aliases(cls, method: str):
"""Get the metadata arguments for a method."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Get the metadata arguments for a method."""
"""Get the metadata request values for a method.
Returns a dict with the names of the metadata to be requested for `method`
and the corresponding request values (which can be among {True, False,
None} or a str for an alias). This method first populates the dict with a
`None` request value for every metadata present in `method`s signature and
may then overwrite these request values with request value from a
`__metadata_request__{method}` class attribute.
"""

Maybe like this?

(Clarify what "metadata arguments" is and explain hierarchy of request values for the signatures versus the __metadata_request__{method}.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to clarify the docstring, how does it look now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's much clearer. I like the new method name and the docstring. Only leaving a suggestion for some minor improvements in the other comment.

super().__init_subclass__(**kwargs)

@classmethod
def _build_request_for_signature(cls, router, method):
def _get_metadata_args_and_aliases(cls, method: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it somehow unclear, what a "metadata arg" is. Would it make sense to rename this method into something like _get_metadata_request_values_and_aliases() or, since the alias can be considered as a request value as well, _get_metadata_request_values() maybe? In the docs for MethodMetadataRequest, we call that same information "request info", which would also be an option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a classmethod, it doesn't really take into account instance level requests. So I've renamed it to _get_class_level_metadata_request_values.

Comment on lines 1485 to 1486
# Here we add request values specified via those class attributes
# to the `MetadataRequest` object. Adding a request which already
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

                    # Here we add request values specified via those class attributes
                    # to the `MetadataRequest` object.

I don't think we're really doing that... since we're only building a dict with information here. I am thinking, if anything, we're preparing to add something to a MethodMetadataRequest, but not here directly, only after this method has returned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😁 yep, outdated, used to do that. Fixing the comment.

@@ -1409,28 +1430,74 @@ def __init_subclass__(cls, **kwargs):
.. [1] https://www.python.org/dev/peps/pep-0487
"""
try:
requests = cls._get_default_requests()
for method in SIMPLE_METHODS:
method_metadata_args = cls._get_metadata_args_and_aliases(method)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method_metadata_args = cls._get_metadata_args_and_aliases(method)

Here too, I would suggest to rename method_metadata_args into maybe method_metadata_request_values or method_metadata_request_info to give anyone reading this a hand in understanding what this is about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming the variable to requests to keep it short, and the function name should now be clear

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thank you!

Comment on lines 1478 to 1494
for base_class in reversed(inspect.getmro(cls)):
for attr, value in vars(base_class).items():
# we don't check for equivalence since python prefixes attrs
# starting with __ with the `_ClassName`.
if substr not in attr:
continue
for prop, alias in value.items():
# Here we add request values specified via those class attributes
# to the `MetadataRequest` object. Adding a request which already
# exists will override the previous one. Since we go through the
# MRO in reverse order, the one specified by the lowest most classes
# in the inheritance tree are the ones which take effect.
if prop not in params and alias == UNUSED:
raise ValueError(
f"Trying to remove parameter {prop} with UNUSED which"
" doesn't exist."
)
Copy link
Contributor

@StefanieSenger StefanieSenger Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I wonder if it is required to loop over the whole mro, because as I understand we only need to check if the attribute is existing on the class that is the closest to the current cls.

So, getattr(cls, f"__metadata_request__{method}") (or similar) and then compare if substr is a part of it isn't enough? Isn't getattr() doing all the work for us by going through the mro while prioritising child classes?

Edit: I have tried this out in this comment.

Comment on lines 1473 to 1496
# need to go through the MRO since this is a class attribute and
# ``vars`` doesn't report the parent class attributes. We go through
# the reverse of the MRO so that child classes have precedence over
# their parents.
substr = f"__metadata_request__{method}"
for base_class in reversed(inspect.getmro(cls)):
for attr, value in vars(base_class).items():
# we don't check for equivalence since python prefixes attrs
# starting with __ with the `_ClassName`.
if substr not in attr:
continue
for prop, alias in value.items():
# Here we add request values specified via those class attributes
# to the `MetadataRequest` object. Adding a request which already
# exists will override the previous one. Since we go through the
# MRO in reverse order, the one specified by the lowest most classes
# in the inheritance tree are the ones which take effect.
if prop not in params and alias == UNUSED:
raise ValueError(
f"Trying to remove parameter {prop} with UNUSED which"
" doesn't exist."
)

params[prop] = alias
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried this below and it fails in test_splitter_get_metadata_routing, and in test_metadata_routed_to_group_splitter (both for the same reason) because while getattr() can access the parent's attributes, the name mangeling makes it a bit hard to know what the attributes' name is and then it is not found. That is not unsolvable though, please let me know what you think, @adrinjalali:

Suggested change
# need to go through the MRO since this is a class attribute and
# ``vars`` doesn't report the parent class attributes. We go through
# the reverse of the MRO so that child classes have precedence over
# their parents.
substr = f"__metadata_request__{method}"
for base_class in reversed(inspect.getmro(cls)):
for attr, value in vars(base_class).items():
# we don't check for equivalence since python prefixes attrs
# starting with __ with the `_ClassName`.
if substr not in attr:
continue
for prop, alias in value.items():
# Here we add request values specified via those class attributes
# to the `MetadataRequest` object. Adding a request which already
# exists will override the previous one. Since we go through the
# MRO in reverse order, the one specified by the lowest most classes
# in the inheritance tree are the ones which take effect.
if prop not in params and alias == UNUSED:
raise ValueError(
f"Trying to remove parameter {prop} with UNUSED which"
" doesn't exist."
)
params[prop] = alias
# update dict of metadata request values with info from
# `__metadata_request__{method}` class attributes, if these exist
try:
values = getattr(cls, f"_{cls.__name__}__metadata_request__{method}")
for prop, alias in values.items():
if prop not in params and alias == UNUSED:
raise ValueError(
f"Trying to remove parameter {prop} with UNUSED which"
" doesn't exist."
)
params[prop] = alias
except AttributeError:
pass

I am thinking of two possible solutions:

  1. Here in this method we could have a good guess of what the attribute's name is by using the mro without walking down it.
  2. Or, since we don't have so many splitters that accept groups, and we could also define the class attribute ( __metadata_request__split = {"groups": True}) on them directly instead of via GroupsConsumerMixin.

That would (hopefully) make all the tests pass and also simplify the code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to go through the MRO, or else we'd require the __metadata_request__{method} attributes to include everything from their parents as well. Consider the followng:

class A(BaseEstimator):
	__metadata_request__fit = {"sample_weight": True}

	def fit(self, X, y, sample_weight=None):
		pass

class B(A):
	__metadata_request__fit = {"sensitive_attribute": True}

	def fit(self, X, y, sample_weight=None, sensitive_attribute=None):
		pass

If we want to avoid going through the MRO, then we'd have to force the user to have this code:

class A(BaseEstimator):
	__metadata_request__fit = {"sample_weight": True}

	def fit(self, X, y, sample_weight=None):
		pass

class B(A):
	__metadata_request__fit = {"sample_weight": True, "sensitive_attribute": True}

	def fit(self, X, y, sample_weight=None, sensitive_attribute=None):
		pass

which is not ideal.

I think going through the MRO is quite okay here. It's certainly an advanced python feature, but I rather have a more complicated implementation on our side, rather than complicating user / developer side of things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, I can see that and I agree it is more important to be simple for users and third party developers than here.

I guess to avoid going through the mro, we could also do some acrobatics with __setattr__ (or maybe not, because for this to work, the code needs to run... 🤔 and it couldn't be done at class level? ) but I am not sure this would result in simpler code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More a high level / philosophical question, that is not well thought through and also cannot be easily answered:

This module loops so often over similar things, that I wonder if there could be a does-it-all helper function that catches all kinds of information - needed or not - and can be run when needed (or even eagerly?). This is all fast and in my basic understanding performance is not something we need to worry about here.

I wonder if that would simplify maintaining the code in this module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought on a possible refactoring:

__metadata_request__{method} could be a method instead of a set of attributes and then we could use inheritance and super() on it and would not rely on going through the mro for the routing.

It could maybe even be part of the __sklearn_tags__()?

Copy link
Contributor

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the adjustments, @adrinjalali!

I have left a few minor suggestions and a few comments.

On the higher level, my impression is that if owner had to always be an instance (obj), and we would retrieve its string with instance.__class__.__name__ early on where needed, that would simplify the code. Do you think that's a good idea?

Comment on lines +839 to +840
owner : object
The object to which these requests belong.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, then this PR doesn't change the passable type of the owner param for MetadataRouter.

I think it is a good idea to make that possible, because it is more consistent to the other classes now. Would that be a different PR?

This is most usable for Scorers which can give a nice representation of what they
are. This is done by implementing a `_routing_repr` method on the object.

Since the `owner` object could potentially be the type name (str), we return that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Since the `owner` object could potentially be the type name (str), we return that
Since the `owner` object could be the type name (str), we return that

Nit.

Comment on lines +1453 to +1455
This method first checks the `method`'s signature to check which metadata are
available, and then it checks the `__metadata_request__*` class attributes to
get the metadata request values set at class level.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This method first checks the `method`'s signature to check which metadata are
available, and then it checks the `__metadata_request__*` class attributes to
get the metadata request values set at class level.
This method first checks the `method`'s signature for passable metadata
and then updates these with the metadata request values set at class level
via the `__metadata_request__{method}` class attributes.

That docstring is pretty useful. Only some subtle suggestions, because "checks" sounds a bit unclear to me.

Comment on lines +1479 to +1481
# __metadata_request__* attributes. Defaults set in
# __metadata_request__* attributes take precedence over signature
# sniffing.
Copy link
Contributor

@StefanieSenger StefanieSenger Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# __metadata_request__* attributes. Defaults set in
# __metadata_request__* attributes take precedence over signature
# sniffing.
# `__metadata_request__{method}` class attributes,
# which take precedence over signature sniffing.

Turning __metadata_request__* attributes into __metadata_request__{method} class attributes because I find it easier at first glance to distinguish from the _metadata_request instance attribute.

(I think I did the rest in #31695.)

super().__init_subclass__(**kwargs)

@classmethod
def _build_request_for_signature(cls, router, method):
def _get_metadata_args_and_aliases(cls, method: str):
"""Get the metadata arguments for a method."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's much clearer. I like the new method name and the docstring. Only leaving a suggestion for some minor improvements in the other comment.

@@ -1409,28 +1430,74 @@ def __init_subclass__(cls, **kwargs):
.. [1] https://www.python.org/dev/peps/pep-0487
"""
try:
requests = cls._get_default_requests()
for method in SIMPLE_METHODS:
method_metadata_args = cls._get_metadata_args_and_aliases(method)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thank you!

Comment on lines 1473 to 1496
# need to go through the MRO since this is a class attribute and
# ``vars`` doesn't report the parent class attributes. We go through
# the reverse of the MRO so that child classes have precedence over
# their parents.
substr = f"__metadata_request__{method}"
for base_class in reversed(inspect.getmro(cls)):
for attr, value in vars(base_class).items():
# we don't check for equivalence since python prefixes attrs
# starting with __ with the `_ClassName`.
if substr not in attr:
continue
for prop, alias in value.items():
# Here we add request values specified via those class attributes
# to the `MetadataRequest` object. Adding a request which already
# exists will override the previous one. Since we go through the
# MRO in reverse order, the one specified by the lowest most classes
# in the inheritance tree are the ones which take effect.
if prop not in params and alias == UNUSED:
raise ValueError(
f"Trying to remove parameter {prop} with UNUSED which"
" doesn't exist."
)

params[prop] = alias
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, I can see that and I agree it is more important to be simple for users and third party developers than here.

I guess to avoid going through the mro, we could also do some acrobatics with __setattr__ (or maybe not, because for this to work, the code needs to run... 🤔 and it couldn't be done at class level? ) but I am not sure this would result in simpler code.

Comment on lines 1473 to 1496
# need to go through the MRO since this is a class attribute and
# ``vars`` doesn't report the parent class attributes. We go through
# the reverse of the MRO so that child classes have precedence over
# their parents.
substr = f"__metadata_request__{method}"
for base_class in reversed(inspect.getmro(cls)):
for attr, value in vars(base_class).items():
# we don't check for equivalence since python prefixes attrs
# starting with __ with the `_ClassName`.
if substr not in attr:
continue
for prop, alias in value.items():
# Here we add request values specified via those class attributes
# to the `MetadataRequest` object. Adding a request which already
# exists will override the previous one. Since we go through the
# MRO in reverse order, the one specified by the lowest most classes
# in the inheritance tree are the ones which take effect.
if prop not in params and alias == UNUSED:
raise ValueError(
f"Trying to remove parameter {prop} with UNUSED which"
" doesn't exist."
)

params[prop] = alias
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More a high level / philosophical question, that is not well thought through and also cannot be easily answered:

This module loops so often over similar things, that I wonder if there could be a does-it-all helper function that catches all kinds of information - needed or not - and can be run when needed (or even eagerly?). This is all fast and in my basic understanding performance is not something we need to worry about here.

I wonder if that would simplify maintaining the code in this module.

@StefanieSenger
Copy link
Contributor

StefanieSenger commented Jul 8, 2025

Concerning the CI failure, see #31722.

@virchan
Copy link
Member

virchan commented Jul 8, 2025

Concerning the CI failure, see #31722.

I think this is fixed in main by #31690. So updating the branch should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Metadata Routing all issues related to metadata routing, slep006, sample props No Changelog Needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants