- API reference
- DataFrame
- pandas.DataF...
pandas.DataFrame.replace#
- DataFrame.replace(to_replace=None,value=<no_default>,*,inplace=False,limit=None,regex=False,method=<no_default>)[source]#
Replace values given into_replace withvalue.
Values of the Series/DataFrame are replaced with other values dynamically.This differs from updating with
.loc
or.iloc
, which requireyou to specify a location to update with some value.- Parameters:
- to_replacestr, regex, list, dict, Series, int, float, or None
How to find the values that will be replaced.
numeric, str or regex:
numeric: numeric values equal toto_replace will bereplaced withvalue
str: string exactly matchingto_replace will be replacedwithvalue
regex: regexs matchingto_replace will be replaced withvalue
list of str, regex, or numeric:
First, ifto_replace andvalue are both lists, theymust be the same length.
Second, if
regex=True
then all of the strings inbothlists will be interpreted as regexs otherwise they will matchdirectly. This doesn’t matter much forvalue since thereare only a few possible substitution regexes you can use.str, regex and numeric rules apply as above.
dict:
Dicts can be used to specify different replacement valuesfor different existing values. For example,
{'a':'b','y':'z'}
replaces the value ‘a’ with ‘b’ and‘y’ with ‘z’. To use a dict in this way, the optionalvalueparameter should not be given.For a DataFrame a dict can specify that different valuesshould be replaced in different columns. For example,
{'a':1,'b':'z'}
looks for the value 1 in column ‘a’and the value ‘z’ in column ‘b’ and replaces these valueswith whatever is specified invalue. Thevalue parametershould not beNone
in this case. You can treat this as aspecial case of passing two lists except that you arespecifying the column to search in.For a DataFrame nested dictionaries, e.g.,
{'a':{'b':np.nan}}
, are read as follows: look in column‘a’ for the value ‘b’ and replace it with NaN. The optionalvalueparameter should not be specified to use a nested dict in thisway. You can nest regular expressions as well. Note thatcolumn names (the top-level dictionary keys in a nesteddictionary)cannot be regular expressions.
None:
This means that theregex argument must be a string,compiled regular expression, or list, dict, ndarray orSeries of such elements. Ifvalue is also
None
thenthismust be a nested dictionary or Series.
See the examples section for examples of each of these.
- valuescalar, dict, list, str, regex, default None
Value to replace any values matchingto_replace with.For a DataFrame a dict of values can be used to specify whichvalue to use for each column (columns not in the dict will not befilled). Regular expressions, strings and lists or dicts of suchobjects are also allowed.
- inplacebool, default False
If True, performs operation inplace and returns None.
- limitint, default None
Maximum size gap to forward or backward fill.
Deprecated since version 2.1.0.
- regexbool or same types asto_replace, default False
Whether to interpretto_replace and/orvalue as regularexpressions. Alternatively, this could be a regular expression or alist, dict, or array of regular expressions in which caseto_replace must be
None
.- method{‘pad’, ‘ffill’, ‘bfill’}
The method to use when for replacement, whento_replace is ascalar, list or tuple andvalue is
None
.Deprecated since version 2.1.0.
- Returns:
- Series/DataFrame
Object after replacement.
- Raises:
- AssertionError
Ifregex is not a
bool
andto_replace is notNone
.
- TypeError
Ifto_replace is not a scalar, array-like,
dict
, orNone
Ifto_replace is a
dict
andvalue is not alist
,dict
,ndarray
, orSeries
Ifto_replace is
None
andregex is not compilableinto a regular expression or is a list, dict, ndarray, orSeries.When replacing multiple
bool
ordatetime64
objects andthe arguments toto_replace does not match the type of thevalue being replaced
- ValueError
If a
list
or anndarray
is passed toto_replace andvalue but they are not the same length.
See also
Series.fillna
Fill NA values.
DataFrame.fillna
Fill NA values.
Series.where
Replace values based on boolean condition.
DataFrame.where
Replace values based on boolean condition.
DataFrame.map
Apply a function to a Dataframe elementwise.
Series.map
Map values of Series according to an input mapping or function.
Series.str.replace
Simple string replacement.
Notes
Regex substitution is performed under the hood with
re.sub
. Therules for substitution forre.sub
are the same.Regular expressions will only substitute on strings, meaning youcannot provide, for example, a regular expression matching floatingpoint numbers and expect the columns in your frame that have anumeric dtype to be matched. However, if those floating pointnumbersare strings, then you can do this.
This method hasa lot of options. You are encouraged to experimentand play with this method to gain intuition about how it works.
When dict is used as theto_replace value, it is likekey(s) in the dict are the to_replace part andvalue(s) in the dict are the value parameter.
Examples
Scalar `to_replace` and `value`
>>>s=pd.Series([1,2,3,4,5])>>>s.replace(1,5)0 51 22 33 44 5dtype: int64
>>>df=pd.DataFrame({'A':[0,1,2,3,4],...'B':[5,6,7,8,9],...'C':['a','b','c','d','e']})>>>df.replace(0,5) A B C0 5 5 a1 1 6 b2 2 7 c3 3 8 d4 4 9 e
List-like `to_replace`
>>>df.replace([0,1,2,3],4) A B C0 4 5 a1 4 6 b2 4 7 c3 4 8 d4 4 9 e
>>>df.replace([0,1,2,3],[4,3,2,1]) A B C0 4 5 a1 3 6 b2 2 7 c3 1 8 d4 4 9 e
>>>s.replace([1,2],method='bfill')0 31 32 33 44 5dtype: int64
dict-like `to_replace`
>>>df.replace({0:10,1:100}) A B C0 10 5 a1 100 6 b2 2 7 c3 3 8 d4 4 9 e
>>>df.replace({'A':0,'B':5},100) A B C0 100 100 a1 1 6 b2 2 7 c3 3 8 d4 4 9 e
>>>df.replace({'A':{0:100,4:400}}) A B C0 100 5 a1 1 6 b2 2 7 c3 3 8 d4 400 9 e
Regular expression `to_replace`
>>>df=pd.DataFrame({'A':['bat','foo','bait'],...'B':['abc','bar','xyz']})>>>df.replace(to_replace=r'^ba.$',value='new',regex=True) A B0 new abc1 foo new2 bait xyz
>>>df.replace({'A':r'^ba.$'},{'A':'new'},regex=True) A B0 new abc1 foo bar2 bait xyz
>>>df.replace(regex=r'^ba.$',value='new') A B0 new abc1 foo new2 bait xyz
>>>df.replace(regex={r'^ba.$':'new','foo':'xyz'}) A B0 new abc1 xyz new2 bait xyz
>>>df.replace(regex=[r'^ba.$','foo'],value='new') A B0 new abc1 new new2 bait xyz
Compare the behavior of
s.replace({'a':None})
ands.replace('a',None)
to understand the peculiaritiesof theto_replace parameter:>>>s=pd.Series([10,'a','a','b','a'])
When one uses a dict as theto_replace value, it is like thevalue(s) in the dict are equal to thevalue parameter.
s.replace({'a':None})
is equivalent tos.replace(to_replace={'a':None},value=None,method=None)
:>>>s.replace({'a':None})0 101 None2 None3 b4 Nonedtype: object
When
value
is not explicitly passed andto_replace is a scalar, listor tuple,replace uses the method parameter (default ‘pad’) to do thereplacement. So this is why the ‘a’ values are being replaced by 10in rows 1 and 2 and ‘b’ in row 4 in this case.>>>s.replace('a')0 101 102 103 b4 bdtype: object
Deprecated since version 2.1.0:The ‘method’ parameter and padding behavior are deprecated.
On the other hand, if
None
is explicitly passed forvalue
, it willbe respected:>>>s.replace('a',None)0 101 None2 None3 b4 Nonedtype: object
Changed in version 1.4.0:Previously the explicit
None
was silently ignored.When
regex=True
,value
is notNone
andto_replace is a string,the replacement will be applied in all columns of the DataFrame.>>>df=pd.DataFrame({'A':[0,1,2,3,4],...'B':['a','b','c','d','e'],...'C':['f','g','h','i','j']})
>>>df.replace(to_replace='^[a-g]',value='e',regex=True) A B C0 0 e e1 1 e e2 2 e h3 3 e i4 4 e j
If
value
is notNone
andto_replace is a dictionary, the dictionarykeys will be the DataFrame columns that the replacement will be applied.>>>df.replace(to_replace={'B':'^[a-c]','C':'^[h-j]'},value='e',regex=True) A B C0 0 e f1 1 e g2 2 e e3 3 d e4 4 e e