- API reference
- Series
- pandas.Serie...
pandas.Series.str.extractall#
- Series.str.extractall(pat,flags=0)[source]#
Extract capture groups in the regexpat as columns in DataFrame.
For each subject string in the Series, extract groups from allmatches of regular expression pat. When each subject string in theSeries has exactly one match, extractall(pat).xs(0, level=’match’)is the same as extract(pat).
- Parameters:
- patstr
Regular expression pattern with capturing groups.
- flagsint, default 0 (no flags)
A
remodule flag, for examplere.IGNORECASE. These allowto modify regular expression matching for things like case, spaces,etc. Multiple flags can be combined with the bitwise OR operator,for examplere.IGNORECASE|re.MULTILINE.
- Returns:
- DataFrame
A
DataFramewith one row for each match, and one column for eachgroup. Its rows have aMultiIndexwith first levels that come fromthe subjectSeries. The last level is named ‘match’ and indexes thematches in each item of theSeries. Any capture group names inregular expression pat will be used for column names; otherwise capturegroup numbers will be used.
See also
extractReturns first match only (not all matches).
Examples
A pattern with one group will return a DataFrame with one column.Indices with no matches will not appear in the result.
>>>s=pd.Series(["a1a2","b1","c1"],index=["A","B","C"])>>>s.str.extractall(r"[ab](\d)") 0matchA 0 1 1 2B 0 1
Capture group names are used for column names of the result.
>>>s.str.extractall(r"[ab](?P<digit>\d)") digitmatchA 0 1 1 2B 0 1
A pattern with two groups will return a DataFrame with two columns.
>>>s.str.extractall(r"(?P<letter>[ab])(?P<digit>\d)") letter digitmatchA 0 a 1 1 a 2B 0 b 1
Optional groups that do not match are NaN in the result.
>>>s.str.extractall(r"(?P<letter>[ab])?(?P<digit>\d)") letter digitmatchA 0 a 1 1 a 2B 0 b 1C 0 NaN 1