-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Fix ArgumentOutOfRangeException in XmlReader when parsing malformed UTF-8 sequences #118081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: jeffhandley <1031940+jeffhandley@users.noreply.github.com>
src/libraries/System.Private.Xml/tests/XmlReader/Tests/ReaderEncodingTests.cs
Outdated
Show resolved
Hide resolved
@krwq What do you think of this fix approach. On the surface, it seems to be potentially a broader change than this specific scenario, which worries me. But at the same time, it does seem like a logical fix. |
@@ -3418,6 +3418,11 @@ private int ReadData() | |||
{ | |||
_ps.bytesUsed = 0; | |||
} | |||
else if (bytesLeft < 0) | |||
{ | |||
// This can happen when encoding switch causes bytePos to be calculated incorrectly for malformed data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to detect this situation earlier to avoid introducing invalid ParsingState in the first place?
It is hard to reason about what all else can misbehave due to invalid ParsingState.
This PR fixes an issue where
XmlReader.Create(stream)
throws an undocumentedArgumentOutOfRangeException
instead of the expectedXmlException
when parsing malformed XML containing invalid UTF-8 sequences in the XML declaration.Problem
When parsing XML like
<?xml version="1.0\xbf"?>
from aMemoryStream
, the following exception was thrown:This is problematic because:
ArgumentOutOfRangeException
is not documented forXmlReader
methodsXmlException
Root Cause
The issue occurs in
XmlTextReaderImpl.ReadData()
when:UnDecodeChars()
calculates_ps.bytePos
using_ps.encoding.GetByteCount()
for malformed UTF-8 sequences_ps.bytePos
becomes greater than_ps.bytesUsed
due to encoding issuesbytesLeft = _ps.bytesUsed - _ps.bytePos
becomes negative (-2)Buffer.BlockCopy()
, causing the exceptionSolution
Added bounds checking in
XmlTextReaderImpl.ReadData()
to detect whenbytesLeft
is negative and throw an appropriateXmlException
with the message "Invalid character in the given encoding" instead of allowing the negative value to reachBuffer.BlockCopy()
.The fix is minimal and surgical - it only adds validation where the problem occurs without changing broader parsing logic.
Testing
Added a regression test
ReadWithMalformedUtf8InXmlDeclaration()
that verifies:ArgumentOutOfRangeException
XmlException
is thrown insteadFixes #113061.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.