Skip to content

Fix ArgumentOutOfRangeException in XmlReader when parsing malformed UTF-8 sequences #118081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Jul 26, 2025

This PR fixes an issue where XmlReader.Create(stream) throws an undocumented ArgumentOutOfRangeException instead of the expected XmlException when parsing malformed XML containing invalid UTF-8 sequences in the XML declaration.

Problem

When parsing XML like <?xml version="1.0\xbf"?> from a MemoryStream, the following exception was thrown:

System.ArgumentOutOfRangeException: count ('-2') must be a non-negative value. (Parameter 'count')
   at System.Buffer.BlockCopy(Array src, Int32 srcOffset, Array dst, Int32 dstOffset, Int32 count)
   at System.Xml.XmlTextReaderImpl.ReadData()

This is problematic because:

  1. ArgumentOutOfRangeException is not documented for XmlReader methods
  2. XML parsing errors should consistently throw XmlException
  3. This makes it impossible to catch all XML-related errors with a single catch block

Root Cause

The issue occurs in XmlTextReaderImpl.ReadData() when:

  1. UnDecodeChars() calculates _ps.bytePos using _ps.encoding.GetByteCount() for malformed UTF-8 sequences
  2. The calculated _ps.bytePos becomes greater than _ps.bytesUsed due to encoding issues
  3. bytesLeft = _ps.bytesUsed - _ps.bytePos becomes negative (-2)
  4. This negative value is passed to Buffer.BlockCopy(), causing the exception

Solution

Added bounds checking in XmlTextReaderImpl.ReadData() to detect when bytesLeft is negative and throw an appropriate XmlException with the message "Invalid character in the given encoding" instead of allowing the negative value to reach Buffer.BlockCopy().

The fix is minimal and surgical - it only adds validation where the problem occurs without changing broader parsing logic.

Testing

Added a regression test ReadWithMalformedUtf8InXmlDeclaration() that verifies:

  • The specific issue case no longer throws ArgumentOutOfRangeException
  • An appropriate XmlException is thrown instead
  • The exception message is correct

Fixes #113061.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: jeffhandley <1031940+jeffhandley@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] Undocumented ArgumentOutOfRangeException in XmlReader created from MemoryStream Fix ArgumentOutOfRangeException in XmlReader when parsing malformed UTF-8 sequences Jul 26, 2025
@Copilot Copilot AI requested a review from jeffhandley July 26, 2025 09:45
Copilot finished work on behalf of jeffhandley July 26, 2025 09:45
@jeffhandley
Copy link
Member

@krwq What do you think of this fix approach. On the surface, it seems to be potentially a broader change than this specific scenario, which worries me. But at the same time, it does seem like a logical fix.

@@ -3418,6 +3418,11 @@ private int ReadData()
{
_ps.bytesUsed = 0;
}
else if (bytesLeft < 0)
{
// This can happen when encoding switch causes bytePos to be calculated incorrectly for malformed data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to detect this situation earlier to avoid introducing invalid ParsingState in the first place?

It is hard to reason about what all else can misbehave due to invalid ParsingState.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Undocumented ArgumentOutOfRangeException in XmlReader created from MemoryStream
3 participants