Movatterモバイル変換


[0]ホーム

URL:


Python Pandas Tutorial

Python Pandas - Caveats & Gotchas



Caveats means warning and gotcha means an unseen problem.

Using If/Truth Statement with Pandas

Pandas follows the numpy convention of raising an error when you try to convert something to abool. This happens in anif orwhen using the Boolean operations, and,or, ornot. It is not clear what the result should be. Should it be True because it is not zerolength? False because there are False values? It is unclear, so instead, Pandas raises aValueError

import pandas as pdif pd.Series([False, True, False]):   print 'I am True'

Itsoutput is as follows −

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool() a.item(),a.any() or a.all().

Inif condition, it is unclear what to do with it. The error is suggestive of whether to use aNone orany of those.

import pandas as pdif pd.Series([False, True, False]).any():   print("I am any")

Itsoutput is as follows −

I am any

To evaluate single-element pandas objects in a Boolean context, use the method.bool()

import pandas as pdprint pd.Series([True]).bool()

Itsoutput is as follows −

True

Bitwise Boolean

Bitwise Boolean operators like == and!= will return a Boolean series, which is almost always what is required anyways.

import pandas as pds = pd.Series(range(5))print s==4

Itsoutput is as follows −

0 False1 False2 False3 False4 Truedtype: bool

isin Operation

This returns a Boolean series showing whether each element in the Series is exactly contained in the passed sequence of values.

import pandas as pds = pd.Series(list('abc'))s = s.isin(['a', 'c', 'e'])print s

Itsoutput is as follows −

0 True1 False2 Truedtype: bool

Reindexing vs ix Gotcha

Many users will find themselves using theix indexing capabilities as a concise means of selecting data from a Pandas object −

import pandas as pdimport numpy as npdf = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three','four'],index=list('abcdef'))print dfprint df.ix[['b', 'c', 'e']]

Itsoutput is as follows −

          one        two      three       foura   -1.582025   1.335773   0.961417  -1.272084b    1.461512   0.111372  -0.072225   0.553058c   -1.240671   0.762185   1.511936  -0.630920d   -2.380648  -0.029981   0.196489   0.531714e    1.846746   0.148149   0.275398  -0.244559f   -1.842662  -0.933195   2.303949   0.677641          one        two      three       fourb    1.461512   0.111372  -0.072225   0.553058c   -1.240671   0.762185   1.511936  -0.630920e    1.846746   0.148149   0.275398  -0.244559

This is, of course, completely equivalent in this case to using thereindex method −

import pandas as pdimport numpy as npdf = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three','four'],index=list('abcdef'))print dfprint df.reindex(['b', 'c', 'e'])

Itsoutput is as follows −

          one        two      three       foura    1.639081   1.369838   0.261287  -1.662003b   -0.173359   0.242447  -0.494384   0.346882c   -0.106411   0.623568   0.282401  -0.916361d   -1.078791  -0.612607  -0.897289  -1.146893e    0.465215   1.552873  -1.841959   0.329404f    0.966022  -0.190077   1.324247   0.678064          one        two      three       fourb   -0.173359   0.242447  -0.494384   0.346882c   -0.106411   0.623568   0.282401  -0.916361e    0.465215   1.552873  -1.841959   0.329404

Some might conclude thatix andreindex are 100% equivalent based on this. This is true except in the case of integer indexing. For example, the above operation can alternatively be expressed as −

import pandas as pdimport numpy as npdf = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three','four'],index=list('abcdef'))print dfprint df.ix[[1, 2, 4]]print df.reindex([1, 2, 4])

Itsoutput is as follows −

          one        two      three       foura   -1.015695  -0.553847   1.106235  -0.784460b   -0.527398  -0.518198  -0.710546  -0.512036c   -0.842803  -1.050374   0.787146   0.205147d   -1.238016  -0.749554  -0.547470  -0.029045e   -0.056788   1.063999  -0.767220   0.212476f    1.139714   0.036159   0.201912   0.710119          one        two      three       fourb   -0.527398  -0.518198  -0.710546  -0.512036c   -0.842803  -1.050374   0.787146   0.205147e   -0.056788   1.063999  -0.767220   0.212476    one  two  three  four1   NaN  NaN    NaN   NaN2   NaN  NaN    NaN   NaN4   NaN  NaN    NaN   NaN

It is important to remember thatreindex is strict label indexing only. This can lead to some potentially surprising results in pathological cases where an index contains, say, both integers and strings.

Print Page
Advertisements

[8]ページ先頭

©2009-2025 Movatter.jp