Movatterモバイル変換

numpy/core/src/multiarray/methods.c Outdated

		/* if the python version is below 3.8, the pickle module does not provide
		* built-in support for protocol 5. We try importing the pickle5
		* backport instead */
		if (PY_VERSION_HEX<0x03080000){

Copy link

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Rather than a runtime condition, it should be possible to use a build time preprocessor directive here.

Copy link

ContributorAuthor

pierreglaserSep 25, 2018•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, although it was not necessary for the code to compile - definitely can do it in further iterations, but did not want to generate any misunderstandings :)

pierreglaser commented

Sep 25, 2018

BUG numpy.frombuffer fails on zero-sized dtype#12043

numpy/core/src/multiarray/methods.c Outdated

		{
		intprotocol;
		PyObjectret=NULL,numeric_mod=NULL,*from_buffer_func=NULL;
		PyObjectbuffer_tuple=NULL,pickle_module=NULL,*pickle_class=NULL;

Copy link

ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

spaces here, fixing it.

pierreglaser mentioned this pull request

Sep 27, 2018

Closed

Copy link

Contributor

Thank you very much for the new benchmarks. They look good: there is no more spurious memory copies when pickling and loading contiguous numpy arrays when using protocol 5 either with in-band or out-of-band buffers.

Copy link

Contributor

ogrisel commentedSep 27, 2018

Next steps:

install pickle5 under python 3.7 inhttps://github.com/numpy/numpy/blob/master/tools/travis-before-install.sh
update the tests in the existing numpy test suite to also test withprotocol=5 when possible (that is if pickle5 is installed or whenhasattr(pickle, 'PickleBuffer').

Copy link

Member

ahaldane commentedSep 30, 2018•
edited
Loading

I don't think we can modify the signature ofPyArray_FromBuffer: It is part of the numpy C-api which we are very reluctant to modify, so it's better to go back to the wrapper function you had before.

Otherwise looks fine so far, at inital skim.

ogrisel reviewed

Oct 1, 2018

numpy/core/src/multiarray/methods.c Outdated

Copy link

Contributor

ogriselOct 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It's not necessary to try to importpickle5 whenprotocol < 5. In particulardata.dump(f) should always work for Python 3.6 and 3.7 even if thepickle5 is not installed.

Copy link

Contributor

ogriselOct 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We actually need an explicit test for those cases.

pierreglaser force-pushed theimplement-reduce-ex branch frome5a1de5 to40b81bfCompare

October 1, 2018 16:58

ogrisel reviewed

numpy/core/numeric.py Outdated

Copy link

Contributor

ogriselOct 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

sys.version_info[1] in [6, 7]

ogrisel reviewed

numpy/core/tests/test_dtype.py Outdated

Copy link

Contributor

ogriselOct 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This test should not raise ImportError if pickle5 is not installed in Python 3.7 or 3.6: pickle5 should be an optional dependency.

Copy link

ContributorAuthor

pierreglaserOct 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

specifically this file or all tests?

Copy link

Contributor

ogriselOct 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

To avoid replicating complex import everywhere, you could probably do:

fromnumpy.core.numericimportpickle

ogrisel reviewed

numpy/core/tests/test_records.py Outdated

Copy link

Contributor

ogriselOct 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Same comment here: pickle5 should never be a mandatory test dependency.

Copy link

Contributor

ogriselOct 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

fromnumpy.core.numericimportpickle

ogrisel reviewed

numpy/core/tests/test_ufunc.py Outdated

Copy link

Contributor

ogriselOct 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think the local import is necessary. I think this can be moved to the top level as:

from numpy.core.numeric import pickle

ogrisel reviewed

tools/travis-test.sh Outdated

Copy link

Contributor

ogriselOct 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Instead of hardcoding the installation ofpickle5 based on the Python version number, please use an explicit environment variable

if ["$INSTALL_PICKLE5"=="1" ];then    pip install pickle5fi

and then set that variable on some of the Python 3.6 and/or 3.7 builds in.travis.yml.

pierreglaser force-pushed theimplement-reduce-ex branch froma5faa92 tob415bacCompare

October 2, 2018 14:08

pierreglaser changed the title~~WIP: implementation of array_reduce_ex~~ENH: implementation of array_reduce_ex

ogrisel reviewed

Copy link

Contributor

ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Another batch of comments:

.appveyor.yml Outdated

Copy link

Contributor

ogriselOct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

No need to set the variable when it's 0-valued.

Copy link

Contributor

ogriselOct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

ActuallyINSTALL_PICKLE5=0 will not work as expected with the following bash conditions fromtools/travis-test.sh:

if [ -n "$INSTALL_PICKLE5" ]; then    pip install pickle5fi

so allINSTALL_PICKLE5: 0 occurrences must be removed from this configuration file.

Copy link

Contributor

ogriselOct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actually this is appveyor. But still I think we should not have to setINSTALL_PICKLE5: 0 explicitly.

numpy/core/src/multiarray/methods.c Outdated

Copy link

Contributor

ogriselOct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I would change this comment to:

The PickleBuffer class from version 5 of the pickle protocol can only be used for arrays backed by a contiguous data buffer. For all other cases we fallback to the generic array_reduce method that involves using a temporary bytes allocation. However we do not call array_reduce directly but instead lookup and call the__reduce__ method to make sure that it's possible customize pickling in sub-classes.

numpy/core/tests/test_multiarray.py Outdated

Copy link

Contributor

ogriselOct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nitpick: I think the style for binary operators in numpy is to use whitespaces around operators:

forprotoinrange(2,pickle.HIGHEST_PROTOCOL+1):            ...

this issue is repeated many times.

Copy link

ContributorAuthor

pierreglaserOct 3, 2018•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I believe you, however:

I don't see any mention of it in the docs (https://docs.scipy.org/doc/numpy-1.15.0/dev/index.html )
PEP8 does not explicitly recommendsystematically adding spaces between the+ binary operator (https://www.python.org/dev/peps/pep-0008/)

I can change it, but there was no easy way for me to infer that Ihad to do it.

Copy link

Contributor

ogriselOct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes I agree this is not mandatory. But I looked around and it seems that the numpy convention is to use spaces more often than not.

Copy link

Contributor

ogriselOct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Please see:https://github.com/numpy/numpy/pull/12011/files#r222934391

numpy/core/tests/test_ufunc.py Outdated

Copy link

Contributor

ogriselOct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

unused import?

shippable.yml Outdated

Copy link

Contributor

ogriselOct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If the python version is 3.5 or 3.8 this command will fail.

The test should pass if pickle5 is not there. I think this can be removed now.

ogrisel reviewed

Copy link

Contributor

ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Some more comments:

numpy/core/tests/test_multiarray.py Outdated

Copy link

Contributor

ogriselOct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Instead of passing the test, this should be skipped with pytest:

@pytest.mark.skipif(pickle.HIGHEST_PROTOCOL<5,reason="requires pickle protocol 5")deftest_array_with_dtype_has_object_pickling(self):my_object=object()     ...

numpy/core/tests/test_multiarray.py Outdated

Copy link

Contributor

ogriselOct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Same here, let's use@pytest.mark.skipif:

@pytest.mark.skipif(pickle.HIGHEST_PROTOCOL<5,reason="requires pickle protocol 5")deftest_pickle_with_buffercallback(self):    ...

numpy/core/tests/test_multiarray.py Outdated

Copy link

Contributor

ogriselOct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I would rename this totest_record_array_with_object_dtype. There is already "pickling" in the test class name.

Copy link

Contributor

ogrisel commentedOct 3, 2018•
edited
Loading

By playing this PR I noticed the following:

>>>importnumpyasnp>>>importpickle5>>>a=np.arange(12)

Let's mark the array read-only and then pickle it:

>>>a.flags.writeable=False>>>pickle5.loads(a.dumps(protocol=2)).flags.writeableTrue

The writeable flag is lost: it means that currently (in numpy master and all past versions), a readonly array would be loaded back as a read-write array. I think this is a bug of numpy. (I actually checked that I could reproduce the behavior with the last stable release of numpy).

On the contrary, thePickleBuffer class of the protocol 5 propagates the readonly/writeable state and therefore:

>>>pickle5.loads(a.dumps(protocol=5)).flags.writeableFalse

@ahaldane @charris do you confirm that the fact that adumps/loads cycle does not preserve the writeable flag can be considered a bug?

Copy link

Member

ahaldane commentedOct 3, 2018

My initial impression is it's not a bug: We don't preserve the writeable flag by default when we construct a new array from another one byb = np.array(a), even though we retain other properties like the memory order. To me doing asave + load is like constructing a new buffer from an old one, like thenp.array call.

ogrisel reviewed

numpy/core/numeric.py Outdated

Copy link

Contributor

ogriselOct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I had not realized that you could reshape with order. So it's better to passorder itself directly as an argument to_frombuffer instead of thetranspose flag.

Copy link

Contributor

ogriselOct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I would renamec_contiguous_array toflat_array.

Copy link

Contributor

ogriselOct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

But actually that would just become a one liner:

def_frombuffer(buffer,dtype,shape,order):returnfrombuffer(buffer,dtype=dtype).reshape(shape,order=order)

ogrisel reviewed

numpy/core/src/multiarray/methods.c Outdated

Copy link

Contributor

ogriselOct 3, 2018•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We cannot do this, without setting a proper error message.

For instance with Python 3.5 I get the following with the current implementation:

>>>importnumpyasnp>>>a=np.arange(12)>>>s=a.dumps(protocol=6)Traceback (mostrecentcalllast):File"<ipython-input-9-2e271e7e7bc8>",line1,in<module>s=a.dumps(protocol=6)ValueError:pickleprotocolmustbe<=4>>>s=a.dumps(protocol=5)Traceback (mostrecentcalllast):File"<ipython-input-10-7689e2059d65>",line1,in<module>s=a.dumps(protocol=5)SystemError:<built-inmethoddumpsofnumpy.ndarrayobjectat0x7f2a181c22b0>returnedNULLwithoutsettinganerror

For Python <= 3.5, we need to raise:

ValueError("pickle protocol must be <= %d"%pickle.HIGEST_PROTOCOL)

Copy link

Contributor

ogrisel commentedOct 3, 2018

@ahaldane thanks for the feedback. However I hope that you don't consider a bug that this PR would preserve the writeable flag for contiguous array with the protocol 5?

The fact that this behavior is not consistent if the array is contiguous or not might be surprising to the users though.

Copy link

Member

ahaldane commentedOct 3, 2018

OK, I just reread the PEP and I see what you're getting at: Now that an unpickled ndarray might be just a view of the original pickled ndarray, we have to maintain the writeable flags to avoid creating writeable views of arrays which have writeable=False.

That seems like a strong argument for maintaining the writeable flag.

One might object that it's a bit strange that whether or not an unpickled ndarray is writeable depends on the pickle protocol and whether it was saved to disk or not. But on the other hand, maybe it's easy to explain to people: Protocol <5 always returns a copy, so it is always writeable, while protocol 5 can return a view in which case users should expect it to sometimes be unwriteable. That's the same as many other numpy operations where copies are writeable while views might not be, eg fancy indexing vs slice indexing.

In summary, the way you have it sounds fine.

pierreglaser force-pushed theimplement-reduce-ex branch from7c021e3 to3878df8Compare

October 4, 2018 18:02

Copy link

ContributorAuthor

pierreglaser commentedOct 5, 2018•
edited
Loading

So a little clarification on the error handling:
I think we need personalized messages for python 3.6 and 3.7, alerting users that the highest protocol possible is 5, and not 4, as pickle would do. Usingnumpy.ndarray.dumps,numpy.ndarray.dump, andnumpy.ndarray.__reduce_ex__

ifprotocol > 5, aValueError will be raised saying'pickle protocol must be <=5'
ifprotocol == 5, andpickle5 is not installed, anImportError will be raised telling the user thatpickle5 is needed

For__reduce_ex__, we raise aValueError ifprotocol > 5, and if( protocol == 5 and sys.version_info[:2] < (3, 6)

Otherwise, we let pickle raise its own error messages, that tell the right information. With the code used right now, there will be noSystemError anymore, only pickle error messages, or the messages stated above.

This is all being tested by the way, yielding a drop in the coverage because some tests are mutually exclusive on the python version, but the coverage report is specified using only python 3.6

Copy link

Contributor

ogrisel commentedOct 5, 2018

This is all being tested by the way, yielding a drop in the coverage because some tests are mutually exclusive on the python version, but the coverage report is specified using only python 3.6

We might want to enable coverage collection on Python 3.5 for this reason, no?

ogrisel reviewed

Oct 5, 2018

numpy/core/tests/test_dtype.py Outdated

Copy link

Contributor

ogriselOct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You changed the style of the spacing around the operator for no reason. Please stay consistent with the existing code style.

Copy link

ContributorAuthor

pierreglaserOct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'll go all over therange(...) iterations on the pickle protocol and change them all to one unique style.

ogrisel reviewed

Oct 5, 2018

numpy/core/tests/test_multiarray.py Outdated

Copy link

Contributor

ogriselOct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is an implementation detail not worth testing in my opinion. Instead please check that you can loads the pickles and check that the dtypes are correct.

Copy link

Contributor

tylerjereddy commentedOct 5, 2018

codecov aggregates all the coverage stats and reports the union of coverage reports.

We'd just have to be a little cautious since some projects (i.e., SciPy) appear to have reached the codecov data limit, and then nothing works.

Copy link

Member

mattip commentedOct 8, 2018

Now that#12090,#12091 have been merged, this should be rebased off master and squashed to a smaller number (1?) of relevant commits.

Copy link

Contributor

ogrisel commentedOct 9, 2018

Also the changes in thendarray.dump(s) should be removed from this PR.

pierreglaser force-pushed theimplement-reduce-ex branch 2 times, most recently from37dd980 to4084decCompare

October 10, 2018 12:17

Copy link

ContributorAuthor

pierreglaser commentedOct 10, 2018•
edited
Loading

UPDATE:
Just rebased the PR. I made 2 commits though (and not one), because there still were some cleanups to do about pickle that were not included in#12090. Most notably:

Somepickle.dumps tests that I did not catch inTST, MAINT: Update pickling tests by making them loop over all protocols #12090 that are now looped over for every pickle protocol
Some refactoring about how pickle is imported internally: since we don't want a big conditional block such as:

ifsys.version_info[0]>=3:ifsys.version_info[1]in (6,7):try:importpickle5aspickleexceptImportError:importpickleelse:importpickleelse:importcPickleaspickle

each time we import pickle in python, we import itonce this way, innumpy.core.numeric, and then each timepickle is used in other files, we import it fromnumpy.core.numeric. The only time we importpickle another way is innumpy/core/setup.py

Now,numpy should be cleanpickle-wise, meaning:

there is no morenp.ndarray.dump(s) call in the test suite.
there is a consistent pickle importing behavior
each time pickle.dumps is tested, it is tested for every protocol.

This is what the first commit is about.

The second commit implementsarray_reduce_ex, as well as some tests to check the good behavior of pickle protocol 5 on numpy arrays. This PR can thus be easily broken out in two smaller PRs, one for each commit.
Preventively, I created the PR (#12133) for the first commit only, on which this branch can be furtherly rebased on.

Happy to hear your suggestions,
Pierre

pierreglaser force-pushed theimplement-reduce-ex branch from4084dec to7fce5eaCompare

October 10, 2018 13:17

pierreglaser mentioned this pull request

MAINT, TST refactor pickle imports and tests#12133

Merged

Copy link

Member

charris commentedOct 10, 2018

Needs a release note.

charris added component: numpy._core component: numpy.ma

masked arrays

56 - Needs Release Note.Needs an entry in doc/release/upcoming_changes labels

ENH implement __reduce_ex__ for np.ndarray and pickle protocol 5

64a855f

pierreglaser force-pushed theimplement-reduce-ex branch from7fce5ea to64a855fCompare

October 10, 2018 17:57

MNT added release notes for version 1.16.0

97d2b97

charris removed the 56 - Needs Release Note.Needs an entry in doc/release/upcoming_changes label

charris merged commit2ed08ba intonumpy:master

Leverage the new PEP 574 for no-copy pickling of contiguous arrays#11161

Copy link

Member

charris commentedOct 10, 2018

Any thoughts about adding this to the NumPy API as some point?

Copy link

Contributor

ogrisel commentedOct 11, 2018

Great work. Thanks everyone!

@pitrou are you confident that PEP 574 will be accepted and your protocol 5 branch be merged in CPython before the release of Python 3.8? Otherwise some assumptions made in the code of this PR might be slightly off although they should not cause any breakage (I believe).

Any thoughts about adding this to the NumPy API as some point?

Can you be more specific? I don't think I understand what you are suggesting.

ogrisel mentioned this pull request

Oct 11, 2018

Closed

Copy link

Member

pitrou commentedOct 11, 2018•
edited
Loading

@pitrou are you confident that PEP 574 will be accepted and your protocol 5 branch be merged in CPython before the release of Python 3.8?

I don't think the PEP will be controversial. The main uncertainty at this point is around the schedule for choosing and putting the new governance model in place. But I don't expect 3.8 to be feature-frozen before that is done anyway :-)

ogrisel reviewed

Oct 11, 2018