- API reference
- Series
- pandas.Serie...
pandas.Series.str.split#
- Series.str.split(pat=None,*,n=-1,expand=False,regex=None)[source]#
Split strings around given separator/delimiter.
Splits the string in the Series/Index from the beginning,at the specified delimiter string.
- Parameters:
- patstr or compiled regex, optional
String or regular expression to split on.If not specified, split on whitespace.
- nint, default -1 (all)
Limit number of splits in output.
None
, 0 and -1 will be interpreted as return all splits.- expandbool, default False
Expand the split strings into separate columns.
If
True
, return DataFrame/MultiIndex expanding dimensionality.If
False
, return Series/Index, containing lists of strings.
- regexbool, default None
Determines if the passed-in pattern is a regular expression:
If
True
, assumes the passed-in pattern is a regular expressionIf
False
, treats the pattern as a literal string.If
None
andpat length is 1, treatspat as a literal string.If
None
andpat length is not 1, treatspat as a regular expression.Cannot be set to False ifpat is a compiled regex
Added in version 1.4.0.
- Returns:
- Series, Index, DataFrame or MultiIndex
Type matches caller unless
expand=True
(see Notes).
- Raises:
- ValueError
ifregex is False andpat is a compiled regex
See also
Series.str.split
Split strings around given separator/delimiter.
Series.str.rsplit
Splits string around given separator/delimiter, starting from the right.
Series.str.join
Join lists contained as elements in the Series/Index with passed delimiter.
str.split
Standard library version for split.
str.rsplit
Standard library version for rsplit.
Notes
The handling of then keyword depends on the number of found splits:
If found splits >n, make firstn splits only
If found splits <=n, make all splits
If for a certain row the number of found splits <n,appendNone for padding up ton if
expand=True
If using
expand=True
, Series and Index callers return DataFrame andMultiIndex objects, respectively.Use ofregex =False with apat as a compiled regex will raise an error.
Examples
>>>s=pd.Series(...[..."this is a regular sentence",..."https://docs.python.org/3/tutorial/index.html",...np.nan...]...)>>>s0 this is a regular sentence1 https://docs.python.org/3/tutorial/index.html2 NaNdtype: object
In the default setting, the string is split by whitespace.
>>>s.str.split()0 [this, is, a, regular, sentence]1 [https://docs.python.org/3/tutorial/index.html]2 NaNdtype: object
Without then parameter, the outputs ofrsplit andsplitare identical.
>>>s.str.rsplit()0 [this, is, a, regular, sentence]1 [https://docs.python.org/3/tutorial/index.html]2 NaNdtype: object
Then parameter can be used to limit the number of splits on thedelimiter. The outputs ofsplit andrsplit are different.
>>>s.str.split(n=2)0 [this, is, a regular sentence]1 [https://docs.python.org/3/tutorial/index.html]2 NaNdtype: object
>>>s.str.rsplit(n=2)0 [this is a, regular, sentence]1 [https://docs.python.org/3/tutorial/index.html]2 NaNdtype: object
Thepat parameter can be used to split by other characters.
>>>s.str.split(pat="/")0 [this is a regular sentence]1 [https:, , docs.python.org, 3, tutorial, index...2 NaNdtype: object
When using
expand=True
, the split elements will expand out intoseparate columns. If NaN is present, it is propagated throughoutthe columns during the split.>>>s.str.split(expand=True) 0 1 2 3 40 this is a regular sentence1 https://docs.python.org/3/tutorial/index.html None None None None2 NaN NaN NaN NaN NaN
For slightly more complex use cases like splitting the html document namefrom a url, a combination of parameter settings can be used.
>>>s.str.rsplit("/",n=1,expand=True) 0 10 this is a regular sentence None1 https://docs.python.org/3/tutorial index.html2 NaN NaN
Remember to escape special characters when explicitly using regular expressions.
>>>s=pd.Series(["foo and bar plus baz"])>>>s.str.split(r"and|plus",expand=True) 0 1 20 foo bar baz
Regular expressions can be used to handle urls or file names.Whenpat is a string and
regex=None
(the default), the givenpat is compiledas a regex only iflen(pat)!=1
.>>>s=pd.Series(['foojpgbar.jpg'])>>>s.str.split(r".",expand=True) 0 10 foojpgbar jpg
>>>s.str.split(r"\.jpg",expand=True) 0 10 foojpgbar
When
regex=True
,pat is interpreted as a regex>>>s.str.split(r"\.jpg",regex=True,expand=True) 0 10 foojpgbar
A compiled regex can be passed aspat
>>>importre>>>s.str.split(re.compile(r"\.jpg"),expand=True) 0 10 foojpgbar
When
regex=False
,pat is interpreted as the string itself>>>s.str.split(r"\.jpg",regex=False,expand=True) 00 foojpgbar.jpg