gh-137146: Validate IPv6 ZoneID characters against RFC 6874 in urllib.parse #137148

mauricelambert · 2025-07-27T14:57:30Z

This PR tightens the validation of IPv6 Zone Identifiers (ZoneIDs) in bracketed hostnames handled by urllib.parse (#137146).

Problem

Currently, urllib.parse accepts any non-null string as a ZoneID, because it delegates IPv6 parsing to the ipaddress module, which follows RFC 4007. However, RFC 6874 §2.1 defines a stricter character set for ZoneIDs when used in URLs:

Characters allowed in ZoneIDs (after percent-decoding):
ALPHA / DIGIT / "-" / "." / "_" / "~"

ZoneIDs in URIs must be percent-encoded and may optionally begin with a literal % (e.g., %25) as described in the RFC.

Fix

This patch adds an explicit validation step to check that any ZoneID in a URL conforms to the allowed character set.

Before the fix:

>>> import urllib.parse
>>> urllib.parse.urlparse("http://[fe80::1%zone|bad]/")
ParseResult(scheme='http', netloc='[fe80::1%zone|bad]', path='/', ...)

After the fix:

>>> urllib.parse.urlparse("http://[fe80::1%zone|bad]/")
ValueError: IPv6 ZoneID is invalid

Notes

This does not affect parsing of valid IPv6 addresses or ZoneIDs that comply with RFC 6874.
The new check is only triggered if a % is present in the hostname (i.e., it's a ZoneID).

This improves RFC compliance, reduces risk of incorrect or insecure behavior, and ensures more predictable URL parsing.

Issue: urllib.parse accepts invalid characters in IPv6 ZoneIDs and IPvFuture addresses #137146

…pliant set The current parsing logic for IPv6 addresses with Zone Identifiers (ZoneIDs) uses the `ipaddress` module, which validates ZoneIDs according to RFC 4007, allowing any non-null string. However, when used in URLs, ZoneIDs must follow the percent-encoded format defined in RFC 6874. This patch adds a check to restrict ZoneIDs to the allowed characters: ALPHA / DIGIT / "-" / "." / "_" / "~" / "% HEXDIG HEXDIG" RFC 6874 §2.1 specifies the format of an IPv6 address with a ZoneID in a URI as: `IPv6addrz = IPv6address "%25" ZoneID` Additionally, RFC 6874 recommends accepting a bare `%` without hex digits as a liberal extension, but that flexibility still requires ZoneID content to conform to a safe character set. This patch enforces that ZoneIDs do not include characters outside the permitted range. ### Before the fix: ```py >>> import urllib.parse >>> urllib.parse.urlparse("http://[::1%2|test]/path") ParseResult(scheme='http', netloc='[::1%2|test]', path='/path', ...) ``` Invalid characters such as `|` were incorrectly accepted in ZoneIDs. ### After the fix: ```py >>> import urllib.parse >>> urllib.parse.urlparse("http://[::1%2|test]/path") Traceback (most recent call last): ... ValueError: IPv6 ZoneID is invalid ``` This patch ensures `urllib.parse` properly rejects ZoneIDs with invalid characters, improving compliance with the URI standards and helping prevent subtle bugs or security vulnerabilities.

StanFromIreland · 2025-07-27T15:04:19Z

In the future, please use the title format I have edited your title too, as so that our automation can recognise it.

StanFromIreland

This needs a blurb entry.

ZeroIntensity

Please add a test case.

ZeroIntensity · 2025-07-27T16:31:49Z

Misc/NEWS.d/next/Library/2025-07-27-15-23-32.gh-issue-137146.BE_ylT.rst

@@ -0,0 +1 @@
+Validate IPv6 ZoneID characters in bracketed hostnames to match RFC 6874. `urllib.parse` now rejects ZoneIDs containing invalid or unsafe characters.


This is reStructuredText, not Markdown, so references look like this:

Suggested change

Validate IPv6 ZoneID characters in bracketed hostnames to match RFC 6874. `urllib.parse` now rejects ZoneIDs containing invalid or unsafe characters.

Validate IPv6 ZoneID characters in bracketed hostnames to match RFC 6874. :mod:`urllib.parse` now rejects ZoneIDs containing invalid or unsafe characters.

urllib.parse is a module, if you want to talk about the function it's urllib.parse.urlparse. I've edited Zero's answer by changing the role as I didn't check which functions are affected (if it's the entire module, it's fine to only quote the module)

Oops, thanks.

I corrected the blurb and added the tests, thank you for your help.

bedevere-app bot added the awaiting review label Jul 27, 2025

StanFromIreland changed the title ~~#137146: Validate IPv6 ZoneID characters against RFC 6874 in urllib.parse~~ gh-137146: Validate IPv6 ZoneID characters against RFC 6874 in urllib.parse Jul 27, 2025

bedevere-app bot mentioned this pull request Jul 27, 2025

urllib.parse accepts invalid characters in IPv6 ZoneIDs and IPvFuture addresses #137146

Open

StanFromIreland reviewed Jul 27, 2025

View reviewed changes

Add blurb entries for IPv6 ZoneID validation fixes

e33f050

ZeroIntensity reviewed Jul 27, 2025

View reviewed changes

mauricelambert added 4 commits July 28, 2025 20:39

pythongh-137146: Fix unicode characters in IPv6 Zone ID

0e2684d

pythongh-137146: Add tests on IPv6 Zone ID checks

be71b37

Fix: reStructuredText language syntax

41feeee

pythongh-137146: Fix tests on IPv6 Zone ID checks

b93e903

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-137146: Validate IPv6 ZoneID characters against RFC 6874 in urllib.parse #137148

gh-137146: Validate IPv6 ZoneID characters against RFC 6874 in urllib.parse #137148

mauricelambert commented Jul 27, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

StanFromIreland commented Jul 27, 2025

Uh oh!

StanFromIreland left a comment

Uh oh!

ZeroIntensity left a comment

Uh oh!

ZeroIntensity Jul 27, 2025 •

edited by picnixz

Loading

Uh oh!

picnixz Jul 28, 2025

Uh oh!

ZeroIntensity Jul 28, 2025

Uh oh!

mauricelambert Jul 29, 2025

Uh oh!

Uh oh!

		@@ -0,0 +1 @@
		Validate IPv6 ZoneID characters in bracketed hostnames to match RFC 6874. `urllib.parse` now rejects ZoneIDs containing invalid or unsafe characters.

Uh oh!

gh-137146: Validate IPv6 ZoneID characters against RFC 6874 in urllib.parse #137148

Are you sure you want to change the base?

gh-137146: Validate IPv6 ZoneID characters against RFC 6874 in urllib.parse #137148

Conversation

mauricelambert commented Jul 27, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Before the fix:

After the fix:

Notes

Uh oh!

StanFromIreland commented Jul 27, 2025

Uh oh!

StanFromIreland left a comment

Choose a reason for hiding this comment

Uh oh!

ZeroIntensity left a comment

Choose a reason for hiding this comment

Uh oh!

ZeroIntensity Jul 27, 2025 • edited by picnixz Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

picnixz Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

ZeroIntensity Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

mauricelambert Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mauricelambert commented Jul 27, 2025 •

edited by bedevere-app bot

Loading

ZeroIntensity Jul 27, 2025 •

edited by picnixz

Loading