Skip to content

MNT Refactor get_metadata_routing method in MetadataRequest to directly use _get_metadata_request instead #31695

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

StefanieSenger
Copy link
Contributor

@StefanieSenger StefanieSenger commented Jul 3, 2025

What does this implement/fix? Explain your changes.

This PR de-couples the double-handling of MetadataRouter and MetadataRequest objects in metadata routing ( a bit), by not using the get_metadata_routing method in MetadataRequest anymore, but instead to directly use _get_metadata_request.

This change has affect in several parts of our metadata routing implementation and the tests and there simplifies the code and makes it more intuitive to read.

The goal of this PR is to increase maintainability.

While at it, I also:

  • re-named __metadata_request__* class attribute into __metadata_request__{method} and made clear to mention its a class attribute in our internal documentation to make it easier to distinguish it from the _metadata_request instance attribute present in consumers at first sight
  • made some other small improvements for the external or internal documentation in metadata routing

Copy link

github-actions bot commented Jul 3, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 20c5dcd. Link to the linter CI: here

@StefanieSenger StefanieSenger changed the title Re-factor MNT Refactor double-handling of MetadataRouter and MetadataRequest objects Jul 3, 2025
@StefanieSenger StefanieSenger added Metadata Routing all issues related to metadata routing, slep006, sample props No Changelog Needed and removed module:metrics module:model_selection labels Jul 3, 2025
Comment on lines +996 to +1002
with pytest.warns(UserWarning, match="`transform` method which consumes metadata"):
rct = RoutingCustomTransformer(
estimator=CustomTransformer()
.set_fit_request(metadata=True)
.set_transform_request(metadata=True)
)
rct.fit_transform(X=[[1]], metadata="metadata")
Copy link
Contributor Author

@StefanieSenger StefanieSenger Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was added to make sure the warning is also shown when a transformer is not a pure consumer (but also a router). It refers to the change in sklearn.base.TransformerMixin.fit_transform.

def test_default_requests():
class OddEstimator(BaseEstimator):
def test_class_level_requests():
class StubEstimator(BaseEstimator):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm renaming OddEstimator because the old naming implies we are intentionally demonstrating wrong usage, but that's not our purpose here.

@@ -210,19 +210,19 @@ def test_request_type_is_valid(val, res):


@config_context(enable_metadata_routing=True)
def test_default_requests():
class OddEstimator(BaseEstimator):
def test_class_level_requests():
Copy link
Contributor Author

@StefanieSenger StefanieSenger Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm renaming test_default_requests because the term "default" is not referring to default / auto routing here and could be mis-understood. What is meant are the class level requests.

@StefanieSenger
Copy link
Contributor Author

StefanieSenger commented Jul 5, 2025

Log of my though process (that hopefully helps me move forwards):

The de-coupeling of _get_metadata_request for consumers and get_metadata_routing that was previously used in both consumers and routers and is now only used in routers is a separate issue than removing the add_self_request in routers by using their _get_metadata_request method, that is still present in them.
I understand I should do it in the same PR nevertheless.

I wonder if we want to fully automate the setting of self requests, or if it should be integrated in MetadataRouter.add(). But if the latter, I wonder if it really makes things simpler, because we would force estimator developers to define a method_mapping in .add, that is not easily available for self requests (or only in a hacky way).
So I tend to fully automate, which means that I'd have to manipulate MetadataRouter.__init__(). I have hardly seen us add any functionality to any init method in scikit-learn, and I think for a good reason (predictability or so?). Still, I will make a suggestion to automate the setting of self requests for router, because otherwise, we force us and developers of custom meta-estimators to pass a complicated method mapping into MetadataRouter.add()...

Since then I need to pass an object into MetadataRouter.__init__(), so I can create a self request, I wonder if I should make the existing owner become an object type, instead of a string, or if I should add a new init param that would be the object. Maybe better to pass a new param, because then the owner string can still get a user-defined name?

But potentially, those could both be merged into the same param, since passing MetadataRouter(owner=self.__class__.__name__, owning_obj=self) seems a bit unnecessary.

@StefanieSenger
Copy link
Contributor Author

StefanieSenger commented Jul 5, 2025

I understood that this PR cannot do much about the double-handling of MetadataRouter and MetadataRequest objects (which I had thought it was intending in the beginning), because that depends on several other places and choices taken. But it does part of the job by taking care of certain methods. I will update the PR title to better fit that.

@StefanieSenger StefanieSenger changed the title MNT Refactor double-handling of MetadataRouter and MetadataRequest objects MNT Refactor get_metadata_routing method in MetadataRequest to directly use _get_metadata_request instead Jul 5, 2025
@StefanieSenger
Copy link
Contributor Author

I've experimented with my above idea to add a self-request automatically to routers, but it fails for routers that are pure routers but not also consumers. Given that, I no longer see a benefit in removing or refining the add_self_request method from MetadataRouter. We would still require some input from estimator developers on whether a self request should be added or not.

If that input is done by calling MetadataRouter().add_self_request(self) in get_metadata_routing() or through - in the simplest case, boolean paramater (like add_self_request=True) feels like an implementation detail.

Also, the original change of this PR, the de-tangling of _get_metadata_request and get_metadata_routing methods for consumers doesn't provide a new possibility here, because MetadataRouter().add_self_request had already internally used obj._get_metadata_request().

At this point, I am tempted to think this refactoring is complete, unless there is some specific thing that we want to do with add_self_request?

This PR makes the code more readable, because we can clearly see if we deal with a consumer based on the method called on an object. It also adds some improvements to the internal and the API docs and personally, it has given me more insights and more ideas for refactorings (which I will open separate PRs for).

What is left that I will go through our docs and examples to check whether the right method is used everywhere. I will do that and then un-draft this PR. Do you consider it complete, @adrinjalali?

@@ -46,15 +45,20 @@
MetadataRouter
~~~~~~~~~~~~~~

The ``MetadataRouter`` objects are constructed via a `get_metadata_routing` method,
Copy link
Contributor Author

@StefanieSenger StefanieSenger Jul 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MetadataRouter objects are constructed via a get_metadata_routing method

@adrinjalali, Do you think it makes sense to rename get_metadata_routing into get_metadata_router?
That would be more precise and we spare people to wonder about whether there is a difference between "metadata router" and "metadata routing". In fact, this method returns a MetadataRouter object.

That would be for a new PR and could be done without deprecation (because we are still in experimental stage), I believe?

@StefanieSenger StefanieSenger marked this pull request as ready for review July 5, 2025 17:45
Comment on lines -548 to -550
Consumer-only classes such as simple estimators return a serialized
version of this class as the output of `get_metadata_routing()`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not true (anymore).

get_routing_for_object(), _get_metadata_request() and get_metadata_routing() all return unserialised output.

@@ -1529,20 +1532,6 @@ def _get_metadata_request(self):

return requests

def get_metadata_routing(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A side effect of this change is that this method won't be displayed in the docs for the consumers anymore. I don't think it's too bad, since we want users to use get_routing_for_object anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Metadata Routing all issues related to metadata routing, slep006, sample props module:utils No Changelog Needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant