Movatterモバイル変換


[0]ホーム

URL:


Python Pandas Tutorial

Python Pandas - Basic Functionality



Pandas is a powerful data manipulation library in Python, providing essential tools to work with data in both Series and DataFrame formats. These two data structures are crucial for handling and analyzing large datasets.

Understanding the basic functionalities of Pandas, including its attributes and methods, is essential for effectively managing data, these attributes and methods provide valuable insights into your data, making it easier to understand and process. In this tutorial you will learn about the basic attributes and methods in Pandas that are crucial for working with these data structures.

Working with Attributes in Pandas

Attributes in Pandas allow you to access metadata about your Series and DataFrame objects. By using these attributes you can explore and easily understand the data.

Series and DataFrame Attributes

Following are the widely used attribute of the both Series and DataFrame objects −

Sr.No.Attribute & Description
1

dtype

Returns the data type of the elements in the Series or DataFrame.

2

index

Provides the index (row labels) of the Series or DataFrame.

3

values

Returns the data in the Series or DataFrame as a NumPy array.

4

shape

Returns a tuple representing the dimensionality of the DataFrame (rows, columns).

5

ndim

Returns the number of dimensions of the object. Series is always 1D, and DataFrame is 2D.

6

size

Gives the total number of elements in the object.

7

empty

Checks if the object is empty, and returns True if it is.

8

columns

Provides the column labels of the DataFrame object.

Example

Let's create a Pandas Series and explore these attributes operation.

import pandas as pdimport numpy as np# Create a Series with random numberss = pd.Series(np.random.randn(4))# Exploring attributesprint("Data type of Series:", s.dtype)print("Index of Series:", s.index)print("Values of Series:", s.values)print("Shape of Series:", s.shape)print("Number of dimensions of Series:", s.ndim)print("Size of Series:", s.size)print("Is Series empty?:", s.empty)

Itsoutput is as follows −

Data type of Series: float64Index of Series: RangeIndex(start=0, stop=4, step=1)Values of Series: [-1.02016329  1.40840089  1.36293022  1.33091391]Shape of Series: (4,)Number of dimensions of Series: 1Size of Series: 4Is Series empty?: False

Example

Let's look at below example and understand working of these attributes on a DataFrame object.

import pandas as pdimport numpy as np# Create a DataFrame with random numbersdf = pd.DataFrame(np.random.randn(3, 4), columns=list('ABCD'))print("DataFrame:")print(df)print("Results:")print("Data types:", df.dtypes)print("Index:", df.index)print("Columns:", df.columns)print("Values:")print(df.values)print("Shape:", df.shape)print("Number of dimensions:", df.ndim)print("Size:", df.size)print("Is empty:", df.empty)

On executing the above code you will get the following output −

DataFrame:          A         B         C         D0  2.161209 -1.671807 -1.020421 -0.2870651  0.308136 -0.592368 -0.183193  1.3549212 -0.963498 -1.768054 -0.395023 -2.454112Results:Data types: A    float64B    float64C    float64D    float64dtype: objectIndex: RangeIndex(start=0, stop=3, step=1)Columns: Index(['A', 'B', 'C', 'D'], dtype='object')Values:[[ 2.16120893 -1.67180742 -1.02042138 -0.28706468] [ 0.30813618 -0.59236786 -0.18319262  1.35492058] [-0.96349817 -1.76805364 -0.3950226  -2.45411245]]Shape: (3, 4)Number of dimensions: 2Size: 12Is empty: False

Exploring Basic Methods in Pandas

Pandas offers several basic methods in both the data structures, that makes it easy to quickly look at and understand your data. These methods help you get a summary and explore the details without much effort.

Series and DataFrame Methods

Sr.No.Method & Description
1

head(n)

Returns the first n rows of the object. The default value of n is 5.

2

tail(n)

Returns the last n rows of the object. The default value of n is 5.

3

info()

Provides a concise summary of a DataFrame, including the index dtype and column dtypes, non-null values, and memory usage.

4

describe()

Generates descriptive statistics of the DataFrame or Series, such as count, mean, std, min, and max.

Example

Let us now create a Series and see the working of the Series basic methods.

import pandas as pdimport numpy as np# Create a Series with random numberss = pd.Series(np.random.randn(10))print("Series:")print(s)# Using basic methodsprint("First 5 elements of the Series:\n", s.head())print("\nLast 3 elements of the Series:\n", s.tail(3))print("\nDescriptive statistics of the Series:\n", s.describe())

Itsoutput is as follows −

Series:0   -0.2958981   -0.7860812   -1.1898343   -0.4108304   -0.9978665    0.0848686    0.7365417    0.1339498    1.0236749    0.669520dtype: float64First 5 elements of the Series: 0   -0.2958981   -0.7860812   -1.1898343   -0.4108304   -0.997866dtype: float64Last 3 elements of the Series: 7    0.1339498    1.0236749    0.669520dtype: float64Descriptive statistics of the Series: count    10.000000mean     -0.103196std       0.763254min      -1.18983425%      -0.69226850%      -0.10551575%       0.535627max       1.023674dtype: float64

Example

Now look at below example and understand working of the basic methods on a DataFrame object.

import pandas as pdimport numpy as np#Create a Dictionary of seriesdata = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),   'Age':pd.Series([25,26,25,23,30,29,23]),    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])} #Create a DataFramedf = pd.DataFrame(data)print("Our data frame is:\n")print(df)# Using basic methodsprint("\nFirst 5 rows of the DataFrame:\n", df.head())print("\nLast 3 rows of the DataFrame:\n", df.tail(3))print("\nInfo of the DataFrame:")df.info()print("\nDescriptive statistics of the DataFrame:\n", df.describe())

On executing the above code you will get the following output −

Our data frame is:    Name  Age  Rating0    Tom   25    4.231  James   26    3.242  Ricky   25    3.983    Vin   23    2.564  Steve   30    3.205  Smith   29    4.606   Jack   23    3.80First 5 rows of the DataFrame:     Name  Age  Rating0    Tom   25    4.231  James   26    3.242  Ricky   25    3.983    Vin   23    2.564  Steve   30    3.20Last 3 rows of the DataFrame:     Name  Age  Rating4  Steve   30     3.25  Smith   29     4.66   Jack   23     3.8Info of the DataFrame:<class 'pandas.core.frame.DataFrame'>RangeIndex: 7 entries, 0 to 6Data columns (total 3 columns): #   Column  Non-Null Count  Dtype  ---  ------  --------------  -----   0   Name    7 non-null      object  1   Age     7 non-null      int64   2   Rating  7 non-null      float64dtypes: float64(1), int64(1), object(1)memory usage: 296.0+ bytesDescriptive statistics of the DataFrame:              Age    Ratingcount   7.000000  7.000000mean   25.857143  3.658571std     2.734262  0.698628min    23.000000  2.56000025%    24.000000  3.22000050%    25.000000  3.80000075%    27.500000  4.105000max    30.000000  4.600000
Print Page
Advertisements

[8]ページ先頭

©2009-2025 Movatter.jp