- API reference
- Series
- pandas.Serie...
pandas.Series.str.extract#
- Series.str.extract(pat,flags=0,expand=True)[source]#
Extract capture groups in the regexpat as columns in a DataFrame.
For each subject string in the Series, extract groups from thefirst match of regular expressionpat.
- Parameters:
- patstr
Regular expression pattern with capturing groups.
- flagsint, default 0 (no flags)
Flags from the
remodule, e.g.re.IGNORECASE, thatmodify regular expression matching for things like case,spaces, etc. For more details, seere.- expandbool, default True
If True, return DataFrame with one column per capture group.If False, return a Series/Index if there is one capture groupor DataFrame if there are multiple capture groups.
- Returns:
- DataFrame or Series or Index
A DataFrame with one row for each subject string, and onecolumn for each group. Any capture group names in regularexpression pat will be used for column names; otherwisecapture group numbers will be used. The dtype of each resultcolumn is always object, even when no match is found. If
expand=Falseand pat has only one capture group, thenreturn a Series (if subject is a Series) or Index (if subjectis an Index).
See also
extractallReturns all matches (not just the first match).
Examples
A pattern with two groups will return a DataFrame with two columns.Non-matches will be NaN.
>>>s=pd.Series(['a1','b2','c3'])>>>s.str.extract(r'([ab])(\d)') 0 10 a 11 b 22 NaN NaN
A pattern may contain optional groups.
>>>s.str.extract(r'([ab])?(\d)') 0 10 a 11 b 22 NaN 3
Named groups will become column names in the result.
>>>s.str.extract(r'(?P<letter>[ab])(?P<digit>\d)')letter digit0 a 11 b 22 NaN NaN
A pattern with one group will return a DataFrame with one columnif expand=True.
>>>s.str.extract(r'[ab](\d)',expand=True) 00 11 22 NaN
A pattern with one group will return a Series if expand=False.
>>>s.str.extract(r'[ab](\d)',expand=False)0 11 22 NaNdtype: object