Skip to content

f-string debug expressions containing hash '#' are malformed #137182

@kcdodd

Description

@kcdodd

Bug report

Bug description:

There is a bug somewhere in f-string implementation starting around version 3.12, where the presence of a "#" and equal repr in the string causes leading string to be removed: E.G.

f"{'#'=}"

gives

"''#'"

but should give

"'#'='#'".

Note: The following explanation was found by asking https://chatgpt.com/codex to locate the problem. This appears to me to be a correct explanation, but please use with caution.

The bug comes from the change that started stripping text after a “#” when capturing the expression text for an f-string debug expression (f'{expr=}'). This logic was introduced in commit d59feb5 (“gh-112243: Don’t include comments in f-string debug expressions”) dated 2023‑11‑20.

Inside Parser/lexer/lexer.c, set_fstring_expr() now scans the expression buffer for “#” and removes everything from that point until the next newline. The relevant lines introduced in that commit are:

 // Check if there is a # character in the expression
 int hash_detected = 0;
 for (Py_ssize_t i = 0; i < tok_mode->last_expr_size - tok_mode->last_expr_end; i++) {
     if (tok_mode->last_expr_buffer[i] == '#') {
         hash_detected = 1;
         break;
     }
 }

 if (hash_detected) {
     Py_ssize_t input_length = tok_mode->last_expr_size - tok_mode->last_expr_end;
     char *result = (char *)PyMem_Malloc((input_length + 1) * sizeof(char));
     ...
     for (i = 0, j = 0; i < input_length; i++) {
         if (tok_mode->last_expr_buffer[i] == '#') {
             // Skip characters until newline or end of string
             while (tok_mode->last_expr_buffer[i] != '\0' && i < input_length) {
                 if (tok_mode->last_expr_buffer[i] == '\n') {
                     result[j++] = tok_mode->last_expr_buffer[i];
                     break;
                 }
                 i++;
             }
         } else {
             result[j++] = tok_mode->last_expr_buffer[i];
         }
     }
     result[j] = '\0';
     res = PyUnicode_DecodeUTF8(result, j, NULL);
     PyMem_Free(result);
 } else {
     res = PyUnicode_DecodeUTF8(
         tok_mode->last_expr_buffer,
         tok_mode->last_expr_size - tok_mode->last_expr_end,
         NULL
     );
 }

Because this heuristic doesn’t check whether “#” is inside a quoted string, an expression such as '#' is mistakenly treated as starting a comment, leading to the debug string being truncated. This code was added in commit d59feb5, visible in the repository’s history:

commit d59feb5dbe5395615d06c30a95e6a6a9b7681d4d
Author: Pablo Galindo Salgado <Pablogsal@gmail.com>
Date:   Mon Nov 20 15:18:24 2023 +0000

    gh-112243: Don't include comments in f-string debug expressions (#112284)

Therefore the likely cause of the bug appeared in commit d59feb5, modifying Parser/lexer/lexer.c. This commit landed in the 3.12 development cycle and introduced the faulty handling of “#” inside f-string debug expressions.

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)topic-parsertype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions