Series#
Constructor#
| pandas-on-Spark Series that corresponds to pandas Series logically. |
Attributes#
The index (axis labels) Column of the Series. | |
Return the dtype object of the underlying data. | |
Return the dtype object of the underlying data. | |
Return an int representing the number of array dimensions. | |
Return name of the Series. | |
Return a tuple of the shape of the underlying data. | |
Return a list of the row axis labels. | |
Return an int representing the number of elements in this object. | |
Returns true if the current object is empty. | |
Return the transpose, which is self. | |
Return True if it has any missing values. | |
Return a Numpy representation of the DataFrame or the Series. |
Conversion#
| Cast a pandas-on-Spark object to a specified dtype |
| Make a copy of this object's indices and data. |
Return the bool of a single element in the current object. |
Indexing, iteration#
Access a single value for a row/column label pair. | |
Access a single value for a row/column pair by integer position. | |
Access a group of rows and columns by label(s) or a boolean Series. | |
Purely integer-location based indexing for selection by position. | |
Return alias for index. | |
| Return item and drop from series. |
Lazily iterate over (index, value) tuples. | |
Return the first element of the underlying data as a Python scalar. | |
| Return cross-section from the Series. |
| Get item from object for given key (DataFrame column, Panel slice, etc.). |
Binary operator functions#
| Return Addition of series and other, element-wise (binary operator+). |
| Return Floating division of series and other, element-wise (binary operator/). |
| Return Multiplication of series and other, element-wise (binary operator*). |
| Return Reverse Addition of series and other, element-wise (binary operator+). |
| Return Reverse Floating division of series and other, element-wise (binary operator/). |
| Return Reverse Multiplication of series and other, element-wise (binary operator*). |
| Return Reverse Subtraction of series and other, element-wise (binary operator-). |
| Return Reverse Floating division of series and other, element-wise (binary operator/). |
| Return Subtraction of series and other, element-wise (binary operator-). |
| Return Floating division of series and other, element-wise (binary operator/). |
| Return Exponential power of series of series and other, element-wise (binary operator**). |
| Return Reverse Exponential power of series and other, element-wise (binary operator**). |
| Return Modulo of series and other, element-wise (binary operator%). |
| Return Reverse Modulo of series and other, element-wise (binary operator%). |
| Return Integer division of series and other, element-wise (binary operator//). |
| Return Reverse Integer division of series and other, element-wise (binary operator//). |
| Return Integer division and modulo of series and other, element-wise (binary operatordivmod). |
| Return Integer division and modulo of series and other, element-wise (binary operatorrdivmod). |
| Combine Series values, choosing the calling Series's values first. |
| Compare if the current value is less than the other. |
| Compare if the current value is greater than the other. |
| Compare if the current value is less than or equal to the other. |
| Compare if the current value is greater than or equal to the other. |
| Compare if the current value is not equal to the other. |
| Compare if the current value is equal to the other. |
| Return the product of the values. |
| Compute the dot product between the Series and the columns of other. |
Function application, GroupBy & Window#
| Invoke function on values of Series. |
| Aggregate using one or more operations over the specified axis. |
| Aggregate using one or more operations over the specified axis. |
| Call |
| Map values of Series according to input correspondence. |
| Group DataFrame or Series using one or more columns. |
| Provide rolling transformations. |
| Provide expanding transformations. |
| Apply func(self, *args, **kwargs). |
Computations / Descriptive Stats#
Return a Series/DataFrame with absolute numeric value of each element. | |
| Return whether all elements are True. |
| Return whether any element is True. |
| Compute the lag-N autocorrelation. |
| Return boolean Series equivalent to left <= series <= right. |
| Trim values at input threshold(s). |
| Compute correlation withother Series, excluding missing values. |
| Count non-NA cells for each column. |
| Compute covariance with Series, excluding missing values. |
| Return cumulative maximum over a DataFrame or Series axis. |
| Return cumulative minimum over a DataFrame or Series axis. |
| Return cumulative sum over a DataFrame or Series axis. |
| Return cumulative product over a DataFrame or Series axis. |
| Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset's distribution, excluding |
| Provide exponentially weighted window transformations. |
| Subset rows or columns of dataframe according to labels in the specified index. |
| Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). |
| Return the maximum of the values. |
| Return the mean of the values. |
| Return the minimum of the values. |
| Return the mode(s) of the dataset. |
| Return the largestn elements. |
| Return the smallestn elements. |
| Percentage change between the current and a prior element. |
| Return the product of the values. |
| Return number of unique elements in the object. |
Return boolean if values in the object are unique | |
| Return value at the given quantile. |
| Compute numerical data ranks (1 through n) along axis. |
| Return unbiased standard error of the mean over requested axis. |
| Return unbiased skew normalized by N-1. |
| Return sample standard deviation. |
| Return the sum of the values. |
| Return the median of the values for the requested axis. |
| Return unbiased variance. |
| Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). |
Return unique values of Series object. | |
| Return a Series containing counts of unique values. |
| Round each value in a Series to the given number of decimals. |
| First discrete difference of element. |
Return boolean if values in the object are monotonically increasing. | |
Return boolean if values in the object are monotonically decreasing. |
Reindexing / Selection / Label manipulation#
| Align two objects on their axes with the specified join method. |
| Return Series with specified index labels removed. |
| Return Series with requested index level(s) removed. |
| Return Series with duplicate values removed. |
| Indicate duplicate Series values. |
| Compare if the current value is equal to the other. |
| Prefix labels with stringprefix. |
| Suffix labels with string suffix. |
| Select first periods of time series data based on a date offset. |
| Return the first n rows. |
| Return the row label of the maximum value. |
| Return the row label of the minimum value. |
| Check whethervalues are contained in Series or Index. |
| Select final periods of time series data based on a date offset. |
| Alter Series index labels or name. |
| Set the name of the axis for the index or columns. |
| Conform Series to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. |
| Return a Series with matching indices as other object. |
| Generate a new DataFrame or Series with the index reset. |
| Return a random sample of items from an axis of object. |
| Find indices where elements should be inserted to maintain order. |
| Swap levels i and j in a MultiIndex. |
| Interchange axes and swap values axes appropriately. |
| Return the elements in the givenpositional indices along an axis. |
| Return the lastn rows. |
| Replace values where the condition is False. |
| Replace values where the condition is True. |
| Truncate a Series or DataFrame before and after some index value. |
Missing data handling#
| Synonym forDataFrame.fillna() orSeries.fillna() with |
| Synonym forDataFrame.fillna() orSeries.fillna() with |
| Synonym forDataFrame.fillna() orSeries.fillna() with |
Detect existing (non-missing) values. | |
Detect existing (non-missing) values. | |
Detect existing (non-missing) values. | |
Detect existing (non-missing) values. | |
| Synonym forDataFrame.fillna() orSeries.fillna() with |
| Return a new Series with missing values removed. |
| Fill NA/NaN values. |
| Fill NaN values using an interpolation method. |
Reshaping, sorting, transposing#
Return the integer indices that would sort the Series values. | |
| Return int position of the smallest value in the Series. |
| Return int position of the largest value in the Series. |
| Sort object by labels (along an axis) |
| Sort by the values. |
| Unstack, a.k.a. |
Transform each element of a list-like to a row. | |
| Repeat elements of a Series. |
| Squeeze 1 dimensional axis objects into scalars. |
| Encode the object as an enumerated type or categorical variable. |
Combining / joining / merging#
| Compare to another Series and show the differences. |
| Replace values given in to_replace with value. |
| Modify Series in place using non-NA values from passed Series. |
Time series-related#
| Return the last row(s) without any NaNs beforewhere. |
| Resample time-series data. |
| Shift Series/Index by desired number of periods. |
Retrieves the index of the first valid value. | |
Return index for last non-NA/null value. | |
| Select values at particular time of day (example: 9:30AM). |
| Select values between particular times of the day (example: 9:00-9:30 AM). |
Spark-related#
Series.spark
provides features that does not exist in pandas butin Spark. These can be accessed bySeries.spark.<function/property>
.
Spark Column object representing the Series/Index. |
| Applies a function that takes and returns a Spark column. |
| Applies a function that takes and returns a Spark column. |
Accessors#
Pandas API on Spark provides dtype-specific methods under various accessors.These are separate namespaces withinSeries
that only applyto specific data types.
Data Type | Accessor |
---|---|
Datetime | |
String | |
Categorical |
Date Time Handling#
Series.dt
can be used to access the values of the series asdatetimelike and return several properties.These can be accessed likeSeries.dt.<property>
.
Datetime Properties#
Returns a Series of python datetime.date objects (namely, the date part of Timestamps without timezone information). | |
The year of the datetime. | |
The month of the timestamp as January = 1 December = 12. | |
The days of the datetime. | |
The hours of the datetime. | |
The minutes of the datetime. | |
The seconds of the datetime. | |
The microseconds of the datetime. | |
Calculate year, week, and day according to the ISO 8601 standard. | |
The day of the week with Monday=0, Sunday=6. | |
The day of the week with Monday=0, Sunday=6. | |
The ordinal day of the year. | |
The quarter of the date. | |
Indicates whether the date is the first day of the month. | |
Indicates whether the date is the last day of the month. | |
Indicator for whether the date is the first day of a quarter. | |
Indicator for whether the date is the last day of a quarter. | |
Indicate whether the date is the first day of a year. | |
Indicate whether the date is the last day of the year. | |
Boolean indicator if the date belongs to a leap year. | |
The number of days in the month. | |
The number of days in the month. |
Datetime Methods#
Convert times to midnight. | |
| Convert to a string Series using specified date_format. |
| Perform round operation on the data to the specified freq. |
| Perform floor operation on the data to the specified freq. |
| Perform ceil operation on the data to the specified freq. |
| Return the month names of the series with specified locale. |
| Return the day names of the series with specified locale. |
String Handling#
Series.str
can be used to access the values of the series asstrings and apply several methods to it. These can be accessedlikeSeries.str.<function/property>
.
Convert Strings in the series to be capitalized. | |
| Not supported. |
| Filling left and right side of strings in the Series/Index with an additional character. |
| Test if pattern or regex is contained within a string of a Series. |
| Count occurrences of pattern in each string of the Series. |
| Not supported. |
| Not supported. |
| Test if the end of each string element matches a pattern. |
| Not supported. |
| Not supported. |
| Return lowest indexes in each string in the Series where the substring is fully contained between [start:end]. |
| Find all occurrences of pattern or regular expression in the Series. |
Extract element from each string or string list/tuple in the Series at the specified position. | |
| Not supported. |
| Return lowest indexes in each string where the substring is fully contained between [start:end]. |
Check whether all characters in each string are alphanumeric. | |
Check whether all characters in each string are alphabetic. | |
Check whether all characters in each string are digits. | |
Check whether all characters in each string are whitespaces. | |
Check whether all characters in each string are lowercase. | |
Check whether all characters in each string are uppercase. | |
Check whether all characters in each string are title case. | |
Check whether all characters in each string are numeric. | |
Check whether all characters in each string are decimals. | |
| Join lists contained as elements in the Series with passed delimiter. |
Computes the length of each element in the Series. | |
| Filling right side of strings in the Series with an additional character. |
Convert strings in the Series/Index to all lowercase. | |
| Remove leading characters. |
| Determine if each string matches a regular expression. |
| Return the Unicode normal form for the strings in the Series. |
| Pad strings in the Series up to width. |
| Not supported. |
| Duplicate each string in the Series. |
| Replace occurrences of pattern/regex in the Series with some other string. |
| Return highest indexes in each string in the Series where the substring is fully contained between [start:end]. |
| Return highest indexes in each string where the substring is fully contained between [start:end]. |
| Filling left side of strings in the Series with an additional character. |
| Not supported. |
| Split strings around given separator/delimiter. |
| Remove trailing characters. |
| Slice substrings from each element in the Series. |
| Slice substrings from each element in the Series. |
| Split strings around given separator/delimiter. |
| Test if the start of each string element matches a pattern. |
| Remove leading and trailing characters. |
Convert strings in the Series/Index to be swap cased. | |
Convert Strings in the series to be title case. | |
| Map all characters in the string through the given mapping table. |
Convert strings in the Series/Index to all uppercase. | |
| Wrap long strings in the Series to be formatted in paragraphs with length less than a given width. |
| Pad strings in the Series by prepending ‘0’ characters. |
Categorical accessor#
Categorical-dtype specific methods and attributes are available undertheSeries.cat
accessor.
The categories of this categorical. | |
Whether the categories have an ordered relationship. | |
Return Series of codes as well as the index. |
| Rename categories. |
| Reorder categories as specified in new_categories. |
| Add new categories. |
| Remove the specified categories. |
Remove categories which are not used. | |
| Set the categories to the specified new_categories. |
Set the Categorical to be ordered. | |
Set the Categorical to be unordered. |
Plotting#
Series.plot
is both a callable method and a namespace attribute forspecific plotting methods of the formSeries.plot.<kind>
.
| Draw a stacked area plot. |
| Vertical bar plot. |
| Make a horizontal bar plot. |
| Make a box plot of the DataFrame columns. |
| Generate Kernel Density Estimate plot using Gaussian kernels. |
| Draw one histogram of the DataFrame’s columns. |
| Generate Kernel Density Estimate plot using Gaussian kernels. |
| Plot DataFrame/Series as lines. |
| Generate a pie plot. |
| Draw one histogram of the DataFrame’s columns. |
Serialization / IO / Conversion#
Return a pandas Series. | |
A NumPy ndarray representing the values in this DataFrame or Series. | |
Return a list of the values. | |
| Render a string representation of the Series. |
| Convert Series to {label -> value} dict or dict-like object. |
| Copy object to the system clipboard. |
| Render an object to a LaTeX tabular environment table. |
| Print Series or DataFrame in Markdown-friendly format. |
| Convert the object to a JSON string. |
| Write object to a comma-separated values (csv) file. |
| Write object to an Excel sheet. |
| Write the contained data to an HDF5 file using HDFStore. |
| Convert Series to DataFrame. |
Pandas-on-Spark specific#
Series.pandas_on_spark
provides pandas-on-Spark specific features that exists only in pandas API on Spark.These can be accessed bySeries.pandas_on_spark.<function/property>
.
| Transform the data with the function that takes pandas Series and outputs pandas Series. |
- Constructor
- Attributes
- Conversion
- Indexing, iteration
- Binary operator functions
- Function application, GroupBy & Window
- Computations / Descriptive Stats
- Reindexing / Selection / Label manipulation
- Missing data handling
- Reshaping, sorting, transposing
- Combining / joining / merging
- Time series-related
- Spark-related
- Accessors
- Date Time Handling
- String Handling
- Categorical accessor
- Plotting
- Serialization / IO / Conversion
- Pandas-on-Spark specific