DataFrame#
Constructor#
| pandas-on-Spark DataFrame that corresponds to pandas DataFrame logically. |
Attributes and underlying data#
The index (row labels) Column of the DataFrame. | |
| Print a concise summary of a DataFrame. |
The column labels of the DataFrame. | |
Returns true if the current DataFrame is empty. |
Return the dtypes in the DataFrame. | |
Return a tuple representing the dimensionality of the DataFrame. | |
Return a list representing the axes of the DataFrame. | |
Return an int representing the number of array dimensions. | |
Return an int representing the number of elements in this object. | |
| Return a subset of the DataFrame's columns based on the column dtypes. |
Return a Numpy representation of the DataFrame or the Series. |
Conversion#
| Make a copy of this object's indices and data. |
Detects missing values for items in the current Dataframe. | |
| Cast a pandas-on-Spark object to a specified dtype |
Detects missing values for items in the current Dataframe. | |
Detects non-missing values for items in the current Dataframe. | |
Detects non-missing values for items in the current Dataframe. | |
Return the bool of a single element in the current object. |
Indexing, iteration#
Access a single value for a row/column label pair. | |
Access a single value for a row/column pair by integer position. | |
| Return the firstn rows. |
| Return index of first occurrence of maximum over requested axis. |
| Return index of first occurrence of minimum over requested axis. |
Access a group of rows and columns by label(s) or a boolean Series. | |
Purely integer-location based indexing for selection by position. | |
| Insert column into DataFrame at specified location. |
Iterator over (column name, Series) pairs. | |
Iterate over DataFrame rows as (index, Series) pairs. | |
| Iterate over DataFrame rows as namedtuples. |
Return alias for columns. | |
| Return item and drop from frame. |
| Return the lastn rows. |
| Return cross-section from the DataFrame. |
| Get item from object for given key (DataFrame column, Panel slice, etc.). |
| Replace values where the condition is False. |
| Replace values where the condition is True. |
| Query the columns of a DataFrame with a boolean expression. |
Binary operator functions#
| Get Addition of dataframe and other, element-wise (binary operator+). |
| Get Addition of dataframe and other, element-wise (binary operator+). |
| Get Floating division of dataframe and other, element-wise (binary operator/). |
| Get Floating division of dataframe and other, element-wise (binary operator/). |
| Get Floating division of dataframe and other, element-wise (binary operator/). |
| Get Floating division of dataframe and other, element-wise (binary operator/). |
| Get Multiplication of dataframe and other, element-wise (binary operator*). |
| Get Multiplication of dataframe and other, element-wise (binary operator*). |
| Get Subtraction of dataframe and other, element-wise (binary operator-). |
| Get Subtraction of dataframe and other, element-wise (binary operator-). |
| Get Exponential power of series of dataframe and other, element-wise (binary operator**). |
| Get Exponential power of dataframe and other, element-wise (binary operator**). |
| Get Modulo of dataframe and other, element-wise (binary operator%). |
| Get Modulo of dataframe and other, element-wise (binary operator%). |
| Get Integer division of dataframe and other, element-wise (binary operator//). |
| Get Integer division of dataframe and other, element-wise (binary operator//). |
| Compare if the current value is less than the other. |
| Compare if the current value is greater than the other. |
| Compare if the current value is less than or equal to the other. |
| Compare if the current value is greater than or equal to the other. |
| Compare if the current value is not equal to the other. |
| Compare if the current value is equal to the other. |
| Compute the matrix multiplication between the DataFrame and others. |
| Update null elements with value in the same location inother. |
Function application, GroupBy & Window#
| Apply a function along an axis of the DataFrame. |
| Apply a function to a Dataframe elementwise. |
| Apply a function to a Dataframe elementwise. |
| Apply func(self, *args, **kwargs). |
| Aggregate using one or more operations over the specified axis. |
| Aggregate using one or more operations over the specified axis. |
| Group DataFrame or Series using one or more columns. |
| Provide rolling transformations. |
| Provide expanding transformations. |
| Call |
Computations / Descriptive Stats#
Return a Series/DataFrame with absolute numeric value of each element. | |
| Return whether all elements are True. |
| Return whether any element is True. |
| Trim values at input threshold(s). |
| Compute pairwise correlation of columns, excluding NA/null values. |
| Compute pairwise correlation. |
| Count non-NA cells for each column. |
| Compute pairwise covariance of columns, excluding NA/null values. |
| Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset's distribution, excluding |
| Provide exponentially weighted window transformations. |
| Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). |
| Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). |
| Return the maximum of the values. |
| Return the mean of the values. |
| Return the minimum of the values. |
| Return the median of the values for the requested axis. |
| Get the mode(s) of each element along the selected axis. |
| Percentage change between the current and a prior element. |
| Return the product of the values. |
| Return the product of the values. |
| Return value at the given quantile. |
| Compute numerical data ranks (1 through n) along axis. |
| Return number of unique elements in the object. |
| Return unbiased standard error of the mean over requested axis. |
| Return unbiased skew normalized by N-1. |
| Return the sum of the values. |
| Return sample standard deviation. |
| Return unbiased variance. |
| Return cumulative minimum over a DataFrame or Series axis. |
| Return cumulative maximum over a DataFrame or Series axis. |
| Return cumulative sum over a DataFrame or Series axis. |
| Return cumulative product over a DataFrame or Series axis. |
| Round a DataFrame to a variable number of decimal places. |
| First discrete difference of element. |
| Evaluate a string describing operations on DataFrame columns. |
Reindexing / Selection / Label manipulation#
| Prefix labels with stringprefix. |
| Suffix labels with stringsuffix. |
| Align two objects on their axes with the specified join method. |
| Select values at particular time of day (example: 9:30AM). |
| Select values between particular times of the day (example: 9:00-9:30 AM). |
| Drop specified labels from columns. |
| Return DataFrame with requested index / column level(s) removed. |
| Return DataFrame with duplicate rows removed, optionally only considering certain columns. |
| Return boolean Series denoting duplicate rows, optionally only considering certain columns. |
| Compare if the current value is equal to the other. |
| Subset rows or columns of dataframe according to labels in the specified index. |
| Select first periods of time series data based on a date offset. |
| Return the firstn rows. |
| Select final periods of time series data based on a date offset. |
| Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. |
| Return a DataFrame with matching indices as other object. |
| Alter axes labels. |
| Set the name of the axis for the index or columns. |
| Reset the index, or a level of it. |
| Set the DataFrame index (row labels) using one or more existing columns. |
| Interchange axes and swap values axes appropriately. |
| Swap levels i and j in a MultiIndex on a particular axis. |
| Return the elements in the givenpositional indices along an axis. |
| Whether each element in the DataFrame is contained in values. |
| Return a random sample of items from an axis of object. |
| Truncate a Series or DataFrame before and after some index value. |
Missing data handling#
| Synonym forDataFrame.fillna() orSeries.fillna() with |
| Remove missing values. |
| Fill NA/NaN values. |
| Returns a new DataFrame replacing a value with another value. |
| Synonym forDataFrame.fillna() orSeries.fillna() with |
| Synonym forDataFrame.fillna() orSeries.fillna() with |
| Fill NaN values using an interpolation method. |
| Synonym forDataFrame.fillna() orSeries.fillna() with |
Reshaping, sorting, transposing#
| Create a spreadsheet-style pivot table as a DataFrame. |
| Return reshaped DataFrame organized by given index / column values. |
| Sort object by labels (along an axis) |
| Sort by the values along either axis. |
| Return the firstn rows ordered bycolumns in descending order. |
| Return the firstn rows ordered bycolumns in ascending order. |
Stack the prescribed level(s) from columns to index. | |
Pivot the (necessarily hierarchical) index labels. | |
| Unpivot a DataFrame from wide format to long format, optionally leaving identifier variables set. |
| Transform each element of a list-like to a row, replicating index values. |
| Squeeze 1 dimensional axis objects into scalars. |
Transpose index and columns. | |
Transpose index and columns. |
Combining / joining / merging#
| Assign new columns to a DataFrame. |
| Merge DataFrame objects with a database-style join. |
| Join columns of another DataFrame. |
| Modify in place using non-NA values from another DataFrame. |
Time series-related#
| Resample time-series data. |
| Shift DataFrame by desired number of periods. |
Retrieves the index of the first valid value. | |
Return index for last non-NA/null value. |
Serialization / IO / Conversion#
| Construct DataFrame from dict of array-like or dicts. |
| Convert structured or recorded ndarray to DataFrame. |
| Write the DataFrame into a Spark table. |
| Write the DataFrame out as a Delta Lake table. |
| Write the DataFrame out as a Parquet file or directory. |
| Write object to a comma-separated values (csv) file. |
| Write a DataFrame to the ORC format. |
Return a pandas DataFrame. | |
| Render a DataFrame as an HTML table. |
A NumPy ndarray representing the values in this DataFrame or Series. | |
| Spark related features. |
| Render a DataFrame to a console-friendly tabular output. |
| Write a DataFrame to the binary Feather format. |
| Export DataFrame object to Stata dta format. |
| Convert the object to a JSON string. |
| Convert the DataFrame to a dictionary. |
| Write object to an Excel sheet. |
| Write the contained data to an HDF5 file using HDFStore. |
| Copy object to the system clipboard. |
| Print Series or DataFrame in Markdown-friendly format. |
| Convert DataFrame to a NumPy record array. |
| Render an object to a LaTeX tabular environment table. |
Property returning a Styler object containing methods for building a styled HTML representation for the DataFrame. |
Spark-related#
DataFrame.spark
provides features that does not exist in pandas butin Spark. These can be accessed byDataFrame.spark.<function/property>
.
| Return the current DataFrame as a Spark DataFrame. |
Yields and caches the current DataFrame. | |
| Yields and caches the current DataFrame with a specific StorageLevel. |
| Specifies some hint on the current DataFrame. |
| Write the DataFrame into a Spark table. |
| Write the DataFrame out to a Spark data source. |
| Applies a function that takes and returns a Spark DataFrame. |
| Returns a new DataFrame partitioned by the given partitioning expressions. |
| Returns a new DataFrame that has exactlynum_partitions partitions. |
Plotting#
DataFrame.plot
is both a callable method and a namespace attribute forspecific plotting methods of the formDataFrame.plot.<kind>
.
| Draw a stacked area plot. |
| Vertical bar plot. |
| Make a horizontal bar plot. |
| Make a box plot of the DataFrame columns. |
| Generate Kernel Density Estimate plot using Gaussian kernels. |
| Draw one histogram of the DataFrame’s columns. |
| Generate Kernel Density Estimate plot using Gaussian kernels. |
| Plot DataFrame/Series as lines. |
| Generate a pie plot. |
| Create a scatter plot with varying marker point size and color. |
| Draw one histogram of the DataFrame’s columns. |
| Make a box plot of the DataFrame columns. |
| Generate Kernel Density Estimate plot using Gaussian kernels. |
Pandas-on-Spark specific#
DataFrame.pandas_on_spark
provides pandas-on-Spark specific features that exists only in pandas API on Spark.These can be accessed byDataFrame.pandas_on_spark.<function/property>
.
Apply a function that takes pandas DataFrame and outputs pandas DataFrame. | |
Transform chunks with a function that takes pandas DataFrame and outputs pandas DataFrame. |
- Constructor
- Attributes and underlying data
- Conversion
- Indexing, iteration
- Binary operator functions
- Function application, GroupBy & Window
- Computations / Descriptive Stats
- Reindexing / Selection / Label manipulation
- Missing data handling
- Reshaping, sorting, transposing
- Combining / joining / merging
- Time series-related
- Serialization / IO / Conversion
- Spark-related
- Plotting
- Pandas-on-Spark specific