Skip to content

gh-87112: Ensure that only digits convertible to integers are accepted as section number in MIME header parameter #136877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

matthieucan
Copy link
Contributor

@matthieucan matthieucan commented Jul 20, 2025

With those changes, the MIME parameter parser discards parameters with an invalid section number that uses a digit not convertible to integer such as super-script "²" or "𐩃" (Kharosthi number).

For backwards compatibility, keep accepting non-ASCII digits that can be converted to integers, such as NKO digits.

Before:

>>> import email.headerregistry
>>> reg = email.headerregistry.HeaderRegistry()
>>> reg('Content-Disposition', 'inline; foo*0=bar')
'inline; foo="bar"'
>>> reg('Content-Disposition', 'inline; foo*X=bar')
'inline;'
>>> reg('Content-Disposition', 'inline; foo*0=bar; foo*߁=baz')
'inline; foo="barbaz"'
>>> reg('Content-Disposition', 'inline; foo*²=bar')
Traceback (most recent call last):
  File "<python-input-2>", line 1, in <module>
    reg('Content-Disposition', 'inline; foo*²=bar')
    ~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/matthieu/git/cpython/Lib/email/headerregistry.py", line 604, in __call__
    return self[name](name, value)
           ~~~~~~~~~~^^^^^^^^^^^^^
  File "/Users/matthieu/git/cpython/Lib/email/headerregistry.py", line 192, in __new__
    cls.parse(value, kwds)
    ~~~~~~~~~^^^^^^^^^^^^^
  File "/Users/matthieu/git/cpython/Lib/email/headerregistry.py", line 448, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
                                      ~~~~~~~~~~~~~~~~^^^^^^^
  File "/Users/matthieu/git/cpython/Lib/email/_header_value_parser.py", line 2736, in parse_content_disposition_header
    disp_header.append(parse_mime_parameters(value[1:]))
                       ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
  File "/Users/matthieu/git/cpython/Lib/email/_header_value_parser.py", line 2601, in parse_mime_parameters
    token, value = get_parameter(value)
                   ~~~~~~~~~~~~~^^^^^^^
  File "/Users/matthieu/git/cpython/Lib/email/_header_value_parser.py", line 2464, in get_parameter
    token, value = get_section(value)
                   ~~~~~~~~~~~^^^^^^^
  File "/Users/matthieu/git/cpython/Lib/email/_header_value_parser.py", line 2417, in get_section
    section.number = int(digits)
                     ~~~^^^^^^^^
ValueError: invalid literal for int() with base 10: '²'

After:

>>> import email.headerregistry
>>> reg = email.headerregistry.HeaderRegistry()
>>> reg('Content-Disposition', 'inline; foo*0=bar')
'inline; foo="bar"'
>>> reg('Content-Disposition', 'inline; foo*X=bar')
'inline;'
>>> reg('Content-Disposition', 'inline; foo*0=bar; foo*߁=baz')
'inline; foo="barbaz"'
>>> reg('Content-Disposition', 'inline; foo*²=bar')
'inline;'

@picnixz picnixz changed the title gh-87112: Do not fail when non 0-9 digit is used as section number in MIME header parameter gh-87112: Ensure that only ASCII digits are accepted as section number in MIME header parameter Jul 20, 2025
raise errors.HeaderParseError("Expected section number but "
"found {}".format(value))
digits = ''
while value and value[0].isdigit():
while value and '0' <= value[0] <= '9':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while value and '0' <= value[0] <= '9':
while value and ('0' <= value[0] <= '9'):

It will a bit clearer. Or you can still use a separate function to make it even cleareer. The bottleneck won't be the function call IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that, but not the separate function. It was my understanding that @StanFromIreland was leaning towards not having an inner function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, I was against the function to check if it is in a dictionary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Just moved it to a separate function for extra-clarity

@encukou
Copy link
Member

encukou commented Jul 22, 2025

Could you also test with, for example, ߅ (NKO DIGIT FIVE), which int accepts?
Given that some values worked before, IMO the HeaderParseError should only be raised if int() fails; for other non-ASCII numbers this should append an InvalidHeaderDefect.

@matthieucan
Copy link
Contributor Author

Could you also test with, for example, ߅ (NKO DIGIT FIVE), which int accepts? Given that some values worked before, IMO the HeaderParseError should only be raised if int() fails; for other non-ASCII numbers this should append an InvalidHeaderDefect.

Thank you for looking into this.

In my understanding, those are the possible scenarios:

  • The section number is made of ASCII digits ('0' <= d <= '9')
    • Valid, remains unchanged with this PR
  • The section number is made of digits (isdigit(d)) which can be converted to integers (int(d)), e.g.߅ (NKO DIGIT FIVE)
    • This was accepted before, without a defect
    • This PR, in its current state, rejects those (they are ignored and trigger a defect)
    • ➡️ ❓ @encukou Do I understand correctly that they should be accepted, and not raise a defect (even if they are not RFC-valid)? Or should they be accepted and raise a defect?
  • The section number is made of digits (isdigit(d)) which cannot be converted to integers (not int(d)), e.g. ²
    • This used to fail with a raised exception
    • This PR, in its current state, rejects those (they are ignored and trigger a defect)
  • The section number is not made of digits (not isdigit(d))
    • Invalid, remains unchanged with this PR

@encukou
Copy link
Member

encukou commented Jul 23, 2025

The section number is made of digits (isdigit(d)) which can be converted to integers (int(d)), e.g.߅ (NKO DIGIT FIVE). This was accepted before, without a defect

IMO, they should be accepted and raise a defect.

@matthieucan matthieucan changed the title gh-87112: Ensure that only ASCII digits are accepted as section number in MIME header parameter gh-87112: Ensure that only digits convertible to integers are accepted as section number in MIME header parameter Jul 27, 2025
@matthieucan
Copy link
Contributor Author

The section number is made of digits (isdigit(d)) which can be converted to integers (int(d)), e.g.߅ (NKO DIGIT FIVE). This was accepted before, without a defect

IMO, they should be accepted and raise a defect.

Thank you, I agree. This is now implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants