Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Dec 9, 2025

Summary by CodeRabbit

  • Bug Fixes

    • Certificate verification failures are now correctly propagated and reported (with contextual messages) so clients receive proper TLS alerts.
    • Deferred verification errors are surfaced after related I/O completes to ensure consistent error timing.
    • Improved detection and handling of closed-connection scenarios, treating certain socket errors as EOF-like.
  • Refactor

    • Unified and simplified SSL/TLS handshake flow and lock management to reduce deadlock risk and improve consistency.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 9, 2025

Walkthrough

Simplifies TLS handshake lock handling, unifies client/server handshake flow, moves deferred certificate-verification checks to occur after I/O (read/write/handshake), and improves socket error detection to treat certain connection-closed errors as EOF.

Changes

Cohort / File(s) Summary
Handshake & I/O control flow
crates/stdlib/src/ssl.rs
Consolidates client/server handshake into a single path using an is_client flag, removes per-branch guard-drop patterns, drops/reacquires the connection lock earlier, reworks read/write loops and write-flush semantics, and moves post-handshake deferred-cert checks after I/O completion.
Certificate verification propagation
crates/stdlib/src/ssl/cert.rs
DeferredClientCertVerifier::verify_client_cert now records a formatted per-error message and returns the original verification error to rustls instead of returning success on failure.
Socket/exception mapping & EOF detection
crates/stdlib/src/ssl/compat.rs
Adds is_connection_closed_error(exc, vm) helper to classify connection-closed socket errors (ConnectionAborted/Reset or ECONNABORTED/ECONNRESET) as EOF-like, uses IntoPyException conversion for I/O errors, and checks queued TLS alerts before mapping socket errors to SslError::Eof.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant ClientApp as Client
    participant TLS as TLSConn (rustls)
    participant Socket as Socket
    participant PyLayer as Python VM

    Note over ClientApp,TLS: Handshake/read/write unified flow (is_client flag)
    ClientApp->>TLS: initiate handshake / read / write
    TLS->>Socket: perform socket I/O (send/recv)
    Socket-->>TLS: I/O result (data or error)
    alt data received
        TLS->>TLS: process rustls packets
        TLS->>PyLayer: check_deferred_cert_error (post-ssl_read or post-write or after handshake)
        PyLayer-->>TLS: deferred cert error (optional)
        TLS->>ClientApp: return decrypted/plain data
    else socket error
        TLS->>TLS: check queued TLS alerts
        TLS->>PyLayer: is_connection_closed_error?
        alt connection-closed
            TLS->>ClientApp: return EOF (SslError::Eof)
        else other error
            TLS->>ClientApp: map and raise Python I/O exception via IntoPyException
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

  • Areas needing careful review:
    • crates/stdlib/src/ssl.rs: lock/drop timing changes and unified handshake control flow (deadlock/race checks).
    • crates/stdlib/src/ssl/cert.rs: ensure propagated verification errors produce correct TLS alerts and server behavior.
    • crates/stdlib/src/ssl/compat.rs: correctness of is_connection_closed_error across platforms and errno handling; verify queued-alert checks before mapping EOF.

Suggested reviewers

  • coolreader18
  • ShaharNaveh
  • arihant2math

Poem

🐇 I hopped through locks with nimble paws,

Unified paths and cleaner laws.
Deferred certs now speak their name,
EOFs found true, not just a claim.
A little rabbit cheers the change—hip, hop, hooray! 🎉

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix SSL deferred error' is concise and directly references the main change: addressing deferred certificate verification error handling in SSL operations.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

Code has been automatically formatted

The code in this PR has been formatted using cargo fmt --all.
Please pull the latest changes before pushing again:

git pull origin ssl

@youknowone youknowone force-pushed the ssl branch 3 times, most recently from 3abe096 to b5d0a8f Compare December 9, 2025 11:51
@youknowone youknowone marked this pull request as ready for review December 9, 2025 11:56
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
crates/stdlib/src/ssl.rs (1)

2602-2604: The explicit drop on line 2604 has no effect.

The statement let _ = conn; does not release the lock early. Since conn is a reference (&TlsConnection), this line is a no-op. The MutexGuard is held by the if let expression and is automatically dropped at the end of the block at line 2617.

If the intent is to release the lock immediately after calling is_session_resumed(), you need to drop the guard itself, not the borrowed reference:

-if let Some(ref conn) = *self.connection.lock() {
+let was_resumed = {
+    let conn_guard = self.connection.lock();
+    if let Some(ref conn) = *conn_guard {
+        conn.is_session_resumed()
+    } else {
+        false
+    }
+};
+if was_resumed {
+    *self.session_was_reused.lock() = was_resumed;
+    
-    let was_resumed = conn.is_session_resumed();
-    let _ = conn; // Release lock ASAP
-    *self.session_was_reused.lock() = was_resumed;

However, since the remaining operations in this block (lines 2607-2616) don't hold other locks and execute quickly, the current approach may be acceptable. The comment is misleading though.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a99164f and b5d0a8f.

📒 Files selected for processing (3)
  • crates/stdlib/src/ssl.rs (5 hunks)
  • crates/stdlib/src/ssl/cert.rs (1 hunks)
  • crates/stdlib/src/ssl/compat.rs (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/ssl.rs
  • crates/stdlib/src/ssl/compat.rs
  • crates/stdlib/src/ssl/cert.rs
🧬 Code graph analysis (2)
crates/stdlib/src/ssl.rs (1)
crates/stdlib/src/ssl/compat.rs (5)
  • ssl_do_handshake (1196-1340)
  • ssl_read (1348-1445)
  • data (1612-1614)
  • writer (319-324)
  • new (1668-1670)
crates/stdlib/src/ssl/compat.rs (1)
crates/stdlib/src/socket.rs (1)
  • errno (1836-1842)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
🔇 Additional comments (8)
crates/stdlib/src/ssl/cert.rs (1)

1059-1079: LGTM! Critical fix for proper TLS alert propagation.

The previous behavior (storing error but returning success) would cause the TLS handshake to appear successful even when certificate verification failed, preventing proper TLS alerts from being sent to clients. Now the error is both stored for Python-side reporting AND returned to rustls for proper protocol-level error handling.

The if let Err(ref e) pattern correctly borrows the error without consuming result, allowing the subsequent return result to work correctly.

crates/stdlib/src/ssl/compat.rs (4)

27-27: LGTM!

Adding the IntoPyException import enables idiomatic conversion of I/O errors to Python exceptions.


562-562: LGTM!

Using IntoPyException trait for I/O error conversion is the idiomatic approach in RustPython.


1531-1552: LGTM! Well-structured connection-closed detection.

The function correctly handles both:

  1. High-level Python exception types (ConnectionAbortedError, ConnectionResetError)
  2. Low-level OSError with specific errno values (ECONNABORTED, ECONNRESET)

This comprehensive check ensures connection-closed scenarios are detected regardless of how the underlying error is wrapped.


1589-1609: LGTM! Important fix for SSL error handling, especially on Windows.

The error handling order is correct:

  1. Check for queued TLS alerts first (Lines 1596-1598) - ensures SSL-level errors take precedence over transport errors
  2. Map connection-closed errors to Eof (Lines 1604-1606) - correctly treats unexpected TCP closure as SSL protocol violation
  3. Propagate other errors as Python exceptions (Line 1607)

This addresses the Windows-specific race condition where TCP RST can arrive before TLS alerts are read, which would otherwise cause confusing connection errors instead of proper SSL errors.

crates/stdlib/src/ssl.rs (3)

3212-3251: LGTM! Good refactoring to prevent deadlock.

The changes successfully reduce lock hold time and prevent potential deadlock during SNI callback execution. Key improvements:

  1. The lock is explicitly dropped after ssl_do_handshake returns (line 3215), before any Python callback invocation or recursive calls.
  2. The is_client flag is correctly derived from the connection type.
  3. The server-side SNI callback restart logic correctly clears the connection and recursively calls do_handshake after the lock is released.

This aligns with the PR objective of fixing deferred certificate errors by ensuring locks are managed properly during handshake.


3336-3347: LGTM! Correct placement of deferred certificate error check.

The refactoring correctly moves the check_deferred_cert_error() call to after ssl_read completes (line 3347), while the connection lock is released after line 3342. This ensures:

  1. The deferred error (which may be set during I/O in ssl_read) is checked after the I/O operation completes.
  2. The connection lock is not held during error checking/raising, preventing potential deadlock.
  3. The error is checked before processing the read result, ensuring errors are surfaced to the caller.

This aligns with the PR's goal of fixing deferred SSL errors by checking them at the appropriate time.


3432-3498: LGTM! Correct placement of deferred certificate error check in write path.

The refactoring correctly moves the check_deferred_cert_error() call to after all write operations complete (line 3498), while the connection lock is released after line 3494. This ensures:

  1. The deferred certificate verification error (which may be set during TLS I/O operations) is checked after all write operations complete.
  2. The connection lock is not held during error checking/raising.
  3. The write is atomic - the lock is held for the entire write operation, preventing interleaving with other operations.

The chunked write implementation (lines 3441-3493) correctly:

  • Writes data in 16KB chunks to avoid filling the TLS buffer
  • Flushes TLS data to the socket/BIO after each chunk
  • Handles both BIO and socket modes appropriately

This aligns with the PR's goal of fixing deferred SSL errors.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
crates/stdlib/src/ssl/cert.rs (1)

1065-1079: Client-cert verification behavior change is good; update the “always succeeds” comment.

Returning result while also recording the message fixes the TLS alert behavior and keeps the error available for Python-side checks. However, the comment on DeferredClientCertVerifier still claims this verifier “always succeeds during handshake but stores verification errors for later retrieval,” which is no longer accurate after this change. Please update that comment (and optionally the type name) so future readers don’t assume the handshake always succeeds for TLS 1.3 client-auth.

crates/stdlib/src/ssl.rs (1)

3435-3503: Chunked write/flush loop is reasonable; deferred error check placement is appropriate.

Writing in 16 KiB chunks through conn.writer(), flushing pending TLS data via BIO or socket until wants_write() is false, and only then calling check_deferred_cert_error provides clearer, more robust write semantics while still honoring TLS buffering limits. The extra allocations for the temporary Vec<u8> in socket mode are acceptable; if this ever shows up in profiles, you can optimize by reusing a buffer, but it isn’t urgent.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5d0a8f and fdb4309.

📒 Files selected for processing (3)
  • crates/stdlib/src/ssl.rs (5 hunks)
  • crates/stdlib/src/ssl/cert.rs (1 hunks)
  • crates/stdlib/src/ssl/compat.rs (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/stdlib/src/ssl/compat.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/ssl.rs
  • crates/stdlib/src/ssl/cert.rs
🧬 Code graph analysis (1)
crates/stdlib/src/ssl.rs (1)
crates/stdlib/src/ssl/compat.rs (4)
  • ssl_do_handshake (1196-1340)
  • ssl_read (1348-1445)
  • data (1612-1614)
  • new (1668-1670)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run tests under miri
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run rust tests (macos-latest)
🔇 Additional comments (3)
crates/stdlib/src/ssl.rs (3)

2598-2620: Session-resume tracking and stats update look correct.

Using a short-lived connection.lock() to query is_session_resumed() and then updating session_was_reused and the server-side counters outside that critical section keeps locking tight while correctly reflecting resume state and accept/hit stats.


3215-3254: Unified client/server handshake flow simplifies control and preserves SNI restart semantics.

The new pattern—lock connection once, compute is_client, call ssl_do_handshake, then drop the guard and branch on client vs server—removes duplicated paths without changing observable behavior. The SNI SniCallbackRestart case still runs the Python callback with no connection lock held, clears self.connection, and re-enters do_handshake, which matches the documented OpenSSL-style restart behavior.


3339-3351: Read path now checks deferred cert errors at the right time.

Acquiring the connection lock only for the ssl_read call and then invoking check_deferred_cert_error on the successful Ok(n) path cleanly separates TLS I/O from Python exception raising while ensuring any deferred client-cert failure is surfaced immediately after plaintext is delivered.

@youknowone youknowone merged commit db95946 into RustPython:main Dec 9, 2025
13 checks passed
@youknowone youknowone deleted the ssl branch December 9, 2025 12:59
@coderabbitai coderabbitai bot mentioned this pull request Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant