Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.7k
gh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data#99613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
gh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data#99613
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Previously *consumed was not set in this case.
Lib/test/test_capi/test_unicode.py Outdated
| self.assertRaises(LookupError,decodeutf8,b'a\x80','foo') | ||
| # TODO: Test PyUnicode_DecodeUTF8() with NULL as data and | ||
| # negative size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I dislike TODO which stays forever. Please either address it or remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
In#99594 you requested to split that PR on smaller parts. If address these TODOes, it can add more complexity to this PR, which is not directly related to the bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
If remove them, some cases will never be covered by tests.
| classCAPITest(unittest.TestCase): | ||
| @support.cpython_only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
What do you think of moving these tests to test_codecs? For me, these APIs are more related to codecs than "Unicode". test_codecs already has multiple tests about the UTF-8 encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Done. But would not be more convenient to have tests for allPyUnicode_* functions in a single file?
vstinner commentedNov 21, 2022
It would be nice to move the C code from the giant unicodeobject.c into a separed file, since codecs are not directly related to Unicode. They only "use" the Unicode API. I dislike the fact these functions share the PyUnicode_ prefix, but that cannot be changed now. |
vstinner left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Lib/test/test_capi/test_codecs.py
Oh. I was thinking at moving the tests in the existingLib/test/test_codecs.py file.
See#93649 (comment) discussion in issue#93649.
serhiy-storchaka commentedNov 21, 2022
Also, it is convenient to run all C API test if they are grouped together in a package. |
vstinner commentedNov 21, 2022
For the organization of C API tests, I commented there:#93649 (comment) |
Lib/test/test_capi/test_codecs.py Outdated
| classCAPITest(unittest.TestCase): | ||
| @support.cpython_only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
For me it's redundant with testing _testcapi. For example, PyPy doesn't have _testcapi. But you can keep it if you prefer to be explicit.
Lib/test/test_capi/test_codecs.py Outdated
| classCAPITest(unittest.TestCase): | ||
| @support.cpython_only | ||
| @unittest.skipIf(_testcapiisNone,'need _testcapi module') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
IMO you can move it on the whole class.
Lib/test/test_capi/test_codecs.py Outdated
| try: | ||
| import_testcapi | ||
| exceptImportError: | ||
| _testcapi=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why not skipping this test if the module is missing?
Lib/test/test_capi/test_codecs.py Outdated
| @unittest.skipIf(_testcapiisNone,'need _testcapi module') | ||
| deftest_decodeutf8(self): | ||
| """Test PyUnicode_DecodeUTF8()""" | ||
| from_testcapiimportunicode_decodeutf8asdecodeutf8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
You may usedecodeutf8 = _testcapi.unicode_decodeutf8.
Modules/_testcapi/unicode.c Outdated
| constchar*data; | ||
| Py_ssize_tsize; | ||
| constchar*errors=NULL; | ||
| Py_ssize_tconsumed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I would feel safer if you initialize the value. Maybe to a marker value like 42?
Otherwise, the test may miss the bug by luck, if local variables allocated on the stack are initialize to 0.
vstinner left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM, thanks for updating the PR. I'm more confident with the consumed variable initialized to a value in the test ;-)
serhiy-storchaka commentedDec 1, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I completely agree with you. |
miss-islington commentedDec 1, 2022
Thanks@serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11. |
miss-islington commentedDec 1, 2022
Sorry,@serhiy-storchaka, I could not cleanly backport this to |
miss-islington commentedDec 1, 2022
Sorry@serhiy-storchaka, I had trouble checking out the |
* main: (112 commits)pythongh-99894: Ensure the local names don't collide with the test file in traceback suggestion error checking (python#99895)pythongh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data (pythonGH-99613) Doc: Add summary line to isolation_level & autocommit sqlite3.connect params (python#99917)pythonGH-98906 ```re``` module: ```search() vs. match()``` section should mention ```fullmatch()``` (pythonGH-98916)pythongh-89189: More compact range iterator (pythonGH-27986) bpo-47220: Document the optional callback parameter of weakref.WeakMethod (pythonGH-25491)pythonGH-99905: Fix output of misses in summarize_stats.py execution counts (pythonGH-99906)pythongh-99845: PEP 670: Convert PyObject macros to functions (python#99850)pythongh-99845: Use size_t type in __sizeof__() methods (python#99846)pythonGH-99877) Fix typo in exception message in `multiprocessing.pool` (python#99900)pythongh-87092: move all localsplus preparation into separate function called from assembler stage (pythonGH-99869)pythongh-99891: Fix infinite recursion in the tokenizer when showing warnings (pythonGH-99893)pythongh-99824: Document that sqlite3.connect implicitly open a transaction if autocommit=False (python#99825)pythonGH-81057: remove static state from suggestions.c (python#99411) Improve zip64 limit error message (python#95892)pythongh-98253: Break potential reference cycles in external code worsened by typing.py lru_cache (python#98591)pythongh-99127: Allow some features of syslog to the main interpreter only (pythongh-99128)pythongh-82836: fix private network check (python#97733) Docs: improve accuracy of socketserver reference (python#24767) ...
…nly data (pythonGH-99613)Previously *consumed was not set in this case..(cherry picked from commitf08e52c)Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
bedevere-bot commentedJul 25, 2023
GH-107224 is a backport of this pull request to the3.11 branch. |
…nly data (pythonGH-99613) (pythonGH-107224)(cherry picked from commitb8b3e6a)Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>Previously *consumed was not set in this case.(cherry picked from commitf08e52c)
…SCII-only data (pythonGH-99613) (pythonGH-107224)Previously *consumed was not set in this case.(cherry picked from commitf08e52c).(cherry picked from commitb8b3e6a)Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…ly data (pythonGH-99613) (pythonGH-107224) (python#107231)Previously *consumed was not set in this case.(cherry picked from commitf08e52c).(cherry picked from commitb8b3e6a)
Uh oh!
There was an error while loading.Please reload this page.
Previously *consumed was not set in this case.
This is an extraction from#99594.