Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Difference between pickle.py and _pickle for certain strings #113028

Closed
Labels
stdlibStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or error
@jeff5

Description

@jeff5

Bug report

Bug description:

There is a logical error inpickle.Pickler.save_str for protocol 0, such that it repeats pickling of a string object each time it is presented. The design clearly intends to re-use the first pickled representation, and the C-implementation_pickle does that.

In an implementation that does not provide a compiled_pickle (PyPy may be one) this is inefficient, but not actually wrong. The intended behaviour occurs with a simple string:

>>>s="hello">>>pickle._dumps((s,s),0)b'(Vhello\np0\ng0\ntp1\n.'

When read byloads() this string says:

  1. stack "hello",
  2. save a copy in memory 0,
  3. stack the contents of memory 0,
  4. make a tuple from the stack,
  5. save a copy in memory 1.

The bug emerges when the pickled string needs pre-encoding:

>>>s="hello\n">>>pickle._dumps((s,s),0)b'(Vhello\\u000a\np0\nVhello\\u000a\np1\ntp2\n.'

Here we see identical data stacked and saved (but not used). The problem is here:

cpython/Lib/pickle.py

Lines 860 to 866 in42a86df

obj=obj.replace("\\","\\u005c")
obj=obj.replace("\0","\\u0000")
obj=obj.replace("\n","\\u000a")
obj=obj.replace("\r","\\u000d")
obj=obj.replace("\x1a","\\u001a")# EOF on DOS
self.write(UNICODE+obj.encode('raw-unicode-escape')+
b'\n')

where the return fromobj.replace may be a different object fromobj. In CPython, that is only if a replacement takes place, which is why the problem only appears in the second case above.

save_str is only called when the object has not already been memoized, but in the cases at issue, the string memoized is not the original object, and so when the original string object is presented again,save_str is called again.

Depending upon the detailed behaviour ofstr.replace (in particular, if you decide to return an interned value when the result is, say, a Latin-1 character) an assertion may fail inmemoize():

cpython/Lib/pickle.py

Lines 504 to 507 in42a86df

assertid(obj)notinself.memo
idx=len(self.memo)
self.write(self.put(idx))
self.memo[id(obj)]=idx,obj
I have not managed to trigger anAssertionError in CPython.

This has probably gone unnoticed so long only becausepickle.py is not tested. (At least, I think it isn't.#105250 and#53350 note this coverage problem.)

CPython versions tested on:

3.11

Operating systems tested on:

Windows

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp