Skip to content

gh-81719: Add private members to zipfileZipFile to make it easier to subclass #137101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

adiroiban
Copy link

@adiroiban adiroiban commented Jul 25, 2025

Fixes #81719

This is the bare minimum changes to allow subclassing ZipFile to implement a custom encryption method.

I am using this code to implement WinZIP AES.

This is my first contribution to CPyton , so I try to keep it minimum.


Maybe the private method should not be documented... but just leave a comment in the code.


I am using this code to implement WinZip AES on top of this patch

There are a few other changes that will make it easier to reuse code... but those are not criticial.

For example, the current code validates the Zip password outside of the _ZipDecrypter code.
This means, that if you implement a differrent ZipDecrypter that has a different password validation method, you can't just rewrite ZipExtFile._init_decrypter() and reuse ZipExtFile.__init__

Let me know if you are happy for a bit more of refactoring

Thanks

@python-cla-bot
Copy link

python-cla-bot bot commented Jul 25, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@adiroiban adiroiban force-pushed the 81719-zipfile-refactor-for-subclass branch from dd4fbc8 to 54b7e31 Compare July 25, 2025 13:51
@bedevere-app
Copy link

bedevere-app bot commented Jul 25, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Copy link
Author

@adiroiban adiroiban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review.

I have updated the code.


I was thinking that we can keep these attributes private, at least as a start, while downstream projects try to extend the code.

Once we have more projects using the code, we can look into making this public and defining a public interface.

@adiroiban adiroiban requested a review from ZeroIntensity July 25, 2025 15:24
@adiroiban
Copy link
Author

I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented Jul 25, 2025

Thanks for making the requested changes!

@ZeroIntensity: please review the changes made to this pull request.

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now please add a test case. If you're confused about that in any way, feel free to ask me or take a look at the devguide. We should also add an entry to the "What's New in Python 3.15" page.

Once we have more projects using the code, we can look into making this public and defining a public interface.

Well, we don't really want people to be using a private interface. If we encourage people to use it (i.e., by documenting it), then we still have to follow all the usual backwards compatibility rules, even if it's "private". If we don't document it, then people won't really know to use it.

@adiroiban
Copy link
Author

Thanks. I will clean the code and add automated tests.

@adiroiban
Copy link
Author

adiroiban commented Jul 27, 2025

Thanks Peter for the review.

Docs was updated here https://cpython-previews--137101.org.readthedocs.build/en/137101/whatsnew/3.15.html#zipfile


I now see that we don't have auto-doc enabled for zipfile

https://cpython-previews--137101.org.readthedocs.build/en/137101/library/zipfile.html#zipfile.ZipFile

Should I update the documentation page at Doc/library/zipfile.rst?

The current changes are targeted to develpers that want to extend the ZipFile code,
and not to developers that want to use the code.

So, if you want to extend the code for ZipFile, you will have to read the source code.

So, maybe leave Doc/library/zipfile.rst as it is, since it is targeted to "end users"

Have zipinfo_class documentated only inside the source code.


My bad for forgetting about the tests.

I tried to write the minimum amount of test to cover all the new code.

./python -m pytest Lib/test/test_zipfile/ --cov zipfile --cov-report=xml
./python -m diff_cover.diff_cover_tool coverage.xml 
-------------
Diff Coverage
Diff: origin/main...HEAD, staged and unstaged changes
-------------
Lib/zipfile/__init__.py (100%)
-------------
Total:   12 lines
Missing: 0 lines
Coverage: 100%
-------------

Let me know if you prefer to split the current tests into separate test for each use case.
I tried to split them based on the main functionality: read, write and extract


I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented Jul 27, 2025

Thanks for making the requested changes!

@ZeroIntensity: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from ZeroIntensity July 27, 2025 12:22
Comment on lines 489 to 491
"""
Test in which the files inside the archive are not compressed.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring isn't needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I have removed it. No problem.

I had to read the other tests to decide where I should add the new tests and I thought that this comment can help.

@@ -676,6 +680,104 @@ def test_add_file_after_2107(self):
self.assertEqual(zinfo.date_time, (2107, 12, 31, 23, 59, 59))


class CustomZipInfo(zipfile.ZipInfo):
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here; please just avoid adding docstrings. This is all private code and keeping them up to date tends to be extra maintenance that we don't want.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to document why this class was use and decide if you can reuse it in another test.

I have removed the docstring.

destination = io.BytesIO()
with zipfile.ZipFile(
destination, 'w', zipinfo_class=self.CustomZipInfo) as zipfp:
# It can write using the specific custom classe.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo:

Suggested change
# It can write using the specific custom classe.
# It can write using the specific custom class.

# When creating a new member using just the name,
# the custom ZipInfo is used internally.
with zipfp.open('other-member.txt', mode='w') as memberfp:
self.assertIsInstance(memberfp._zinfo, self.CustomZipInfo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid using private members in tests. If we change how they work, it adds additional maintenance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Make sense.

# When writing from an external file, the file is created using
# the custom ZipInfo
with temp_dir() as source_dir:
source_file = pathlib.Path(source_dir).joinpath('source.txt')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to just use Path(source_dir) / 'source.txt' here.

Comment on lines 776 to 778
with temp_dir() as extract_dir:
zipfp.extract(dir_info, path=extract_dir)
zipfp.extract('dir-as-text/', path=extract_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this part of the test is necessary. We aren't stressing anything on the custom classes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to test this change

    def _extract_member(self, member, targetpath, pwd):
        """Extract the ZipInfo object 'member' to a physical
           file on the path targetpath.
        """
-        if not isinstance(member, ZipInfo):
+        if not isinstance(member, self._ZipInfo):
            member = self.getinfo(member)

There is not much public API to check for this method.

I have updated the assertions as an end to end test.

@ZeroIntensity
Copy link
Member

Should I update the documentation page at Doc/library/zipfile.rst?

Sorry, forgot to mention this in the review. Yeah, please update the documentation. Docstrings are generally very broad and can go out of date quickly.

Copy link
Author

@adiroiban adiroiban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review.

I will wait for the test to pass and then I will request a new review.


I tried to add some "narrative" documentation.

https://cpython-previews--137101.org.readthedocs.build/en/137101/library/zipfile.html#zipfile-objects

I find it hard, as there is a lot of private API

For example in ZipFile.writestr we have

            zinfo = self._ZipInfo(zinfo_or_arcname)._for_archive(self)

Maybe as a followup, we should refactor to use a public method and document the ZipInfo and ZipExtFile public API.


I tried to add tests to cover all changes... but the tests are kind of useless with class CustomZipInfo(zipfile.ZipInfo): as CustomZipInfo will return true for both isinstance(obj, ZipInfo) or isinstance(obj, self._ZipInfo)


I tried to write the tests using delegation for ZipInfo...but I was not to write sometime useful

I got something like this...but I a not sure if this will not make it hard to maintain the tests

class CustomZipInfo:
    """
    Support for testing extending and subclassing ZipFile.
    """
    def __init__(self, *args, **kwargs):
        self._delegated = zipfile.ZipInfo(*args, **kwargs)

    def _for_archive(self, archive):
        self._delegated._for_archive(archive)
        return self

    def __getattribute__(self, name):
        if name in ['_delegated', '_for_archive']:
            return object.__getattribute__(self, name)

        return getattr(self._delegated, name)

    def __setattribute__(self, name, value):
        if name == '_delegated':
            object.__setattribute__(self, name, value)
            return

        return setattr(self._delegated, name, value)


    @classmethod
    def from_file(cls, filename, arcname=None, *, strict_timestamps=True):
        if isinstance(filename, os.PathLike):
            filename = os.fspath(filename)
        st = os.stat(filename)
        isdir = stat.S_ISDIR(st.st_mode)
        mtime = time.localtime(st.st_mtime)
        date_time = mtime[0:6]
        if not strict_timestamps and date_time[0] < 1980:
            date_time = (1980, 1, 1, 0, 0, 0)
        elif not strict_timestamps and date_time[0] > 2107:
            date_time = (2107, 12, 31, 23, 59, 59)
        # Create ZipInfo instance to store file information
        if arcname is None:
            arcname = filename
        arcname = os.path.normpath(os.path.splitdrive(arcname)[1])
        while arcname[0] in (os.sep, os.altsep):
            arcname = arcname[1:]
        if isdir:
            arcname += '/'
        zinfo = cls(arcname, date_time)
        zinfo.external_attr = (st.st_mode & 0xFFFF) << 16  # Unix attributes
        if isdir:
            zinfo.file_size = 0
            zinfo.external_attr |= 0x10  # MS-DOS directory flag
        else:
            zinfo.file_size = st.st_size

        return zinfo

Comment on lines 776 to 778
with temp_dir() as extract_dir:
zipfp.extract(dir_info, path=extract_dir)
zipfp.extract('dir-as-text/', path=extract_dir)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to test this change

    def _extract_member(self, member, targetpath, pwd):
        """Extract the ZipInfo object 'member' to a physical
           file on the path targetpath.
        """
-        if not isinstance(member, ZipInfo):
+        if not isinstance(member, self._ZipInfo):
            member = self.getinfo(member)

There is not much public API to check for this method.

I have updated the assertions as an end to end test.

# When creating a new member using just the name,
# the custom ZipInfo is used internally.
with zipfp.open('other-member.txt', mode='w') as memberfp:
self.assertIsInstance(memberfp._zinfo, self.CustomZipInfo)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Make sense.

@@ -676,6 +680,104 @@ def test_add_file_after_2107(self):
self.assertEqual(zinfo.date_time, (2107, 12, 31, 23, 59, 59))


class CustomZipInfo(zipfile.ZipInfo):
"""
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to document why this class was use and decide if you can reuse it in another test.

I have removed the docstring.

Comment on lines 489 to 491
"""
Test in which the files inside the archive are not compressed.
"""
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I have removed it. No problem.

I had to read the other tests to decide where I should add the new tests and I thought that this comment can help.

@adiroiban
Copy link
Author

I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented Jul 28, 2025

Thanks for making the requested changes!

@ZeroIntensity: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from ZeroIntensity July 28, 2025 14:14
@adiroiban
Copy link
Author

Feedback @serhiy-storchaka from from the issue discussion #136741 (comment)

This looks reasonable.

Is is fine to have class variable or methods purposed for overriding in a subclass. Constructor argument is not needed -- you cannot make ZipFile supporting AES encryption by just passing a custom ZipInfo subclass. This is not what users want to do. A ZipFile subclass which implements all necessary hooks would be more convenient. Of course, you cannot combine several ZipFile subclasses which provide different extended functions, but I don't think this is needed. And it will be easier to include support of AES encryption in the base class.

But I have several general notes:

  • The isinstance() checks should check against ZipInfo. Basic ZipInfo should still be accepted in these functions. Creation of new ZipInfo objects can be customized.
  • Instead of self._ZipInfo.from_file(), call a special ZipFile method. This is more flexible.
  • Instead of _ZipInfo, _ZipExtFile and _ZipWriteFile use methods with more meaningful name and maybe with better designed signature. For example, ZipExtFile() is always called with mode='rb' and close_fileobj=True. They should not be parameters of the method that creates ZipExtFile.

@adiroiban
Copy link
Author

@serhiy-storchaka thanks for your feedback.

The __init__ arguments were requested by Peter.

I think that for this PR the focus is whether we want public init arguments or private class variables

Regarding

. For example, ZipExtFile() is always called with mode='rb' and close_fileobj=True. They should not be parameters of the method that creates ZipExtFile.

This looks like a refactoring that should be done in a separate issue/PR.

I have also mentioned usage of private API in ZipFile.writestr

            zinfo = self._ZipInfo(zinfo_or_arcname)._for_archive(self)

@ZeroIntensity
Copy link
Member

I really don't want to encourage using private class or instance variables to expose use of ZipInfo or ZipExtFile. I think that more fine-grained methods are a good approach too, they're just more work for users here (compared to dropping in their own class in a constructor).

@brianschubert
Copy link
Member

(side note: it looks like there's a typo in the email address that was configured for your last few commits, which is preventing GitHub from associating them with your account. I think there's a missing n at the end?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor zipfile to ease subclassing and enhancement
3 participants