Movatterモバイル変換


[0]ホーム

URL:


Open In App

Pair Plotis a type of chart that shows how different numbers in a dataset relate to each other. It creates multiple small scatter plots, comparing two variables at a time. While Seaborn has a ready-made pairplot() function to quickly create this chart, Matplotliballows more control to customize how the plot looks and behaves. A Pair Plot(also called a scatterplot matrix) consists of:

  • Scatter plots for each pair of numerical variables.
  • Histograms(or kernel density plots) on the diagonal, representing the distribution of individual variables.

This visualization helps in identifying:

  • Linear and non-linear relationships between features.
  • Clusters or groups within data.
  • Potential outliers.

Creating a pair plot using matplotlib

To get started, we first need to import the necessary libraries.

import matplotlib.pyplot as plt

import pandas as pd

import numpy as np

Implementation:

Python
importmatplotlib.pyplotaspltimportpandasaspdimportnumpyasnpnp.random.seed(42)data=pd.DataFrame({'Feature 1':np.random.rand(50),'Feature 2':np.random.rand(50),'Feature 3':np.random.rand(50),'Feature 4':np.random.rand(50)})# Number of featuresnum_features=len(data.columns)# Create Subplots Gridfig,axes=plt.subplots(num_features,num_features,figsize=(10,10))# Loop through each pair of featuresforiinrange(num_features):forjinrange(num_features):ax=axes[i,j]ifi==j:# Diagonal: Histogram of the featureax.hist(data.iloc[:,i],bins=15,color='skyblue',edgecolor='black')else:# Scatter plot for feature pairsax.scatter(data.iloc[:,j],data.iloc[:,i],alpha=0.7,s=10,color="blue")# Set labels on the left and bottom axesifj==0:ax.set_ylabel(data.columns[i],fontsize=10)ifi==num_features-1:ax.set_xlabel(data.columns[j],fontsize=10)# Remove ticks for a cleaner lookax.set_xticks([])ax.set_yticks([])# Adjust layoutplt.tight_layout()plt.show()

Output

download

Explanation:

  • Data Generation:4 features × 50 values (0-1) stored in a Pandas DataFrame(np.random.seed(42)).
  • Subplots Grid: 4×4 layout (plt.subplots()), with histograms on the diagonal (i == j) and scatter plots elsewhere (i ≠ j).
  • Histograms: ax.hist()with 15 bins, skyblue fill, black edges for clarity.
  • Scatter Plots: ax.scatter() with alpha=0.7, s=10, blue color to show relationships.
  • Formatting:Labels only on leftmost column (j == 0) & bottom row (i == num_features - 1). Ticks removed for a clean look. plt.tight_layout() prevents overlap.
  • plt.show()renders the final visualization.

Advantages of pair plot in matplotlib

  • Customizability:Unlike Seaborn’s pairplot(), Matplotlib allows full control over plot styling.
  • Better Integration: Works seamlessly within larger Matplotlib-based visualizations.
  • Flexibility:Can modify elements like colors, markers, line styles, and annotations easily.

Enhancing the pair plot

To improve the visualization, consider:

  • Adding regression lines to scatter plots.
  • Using different colors to highlight categories in the dataset.
  • Replacing histograms with kernel density estimation (KDE) plots.

Example:

Python
importmatplotlib.pyplotaspltimportnumpyasnpimportpandasaspdnp.random.seed(42)data=pd.DataFrame(np.random.rand(50,4),columns=['Feature 1','Feature 2','Feature 3','Feature 4'])# Number of featuresnum_features=len(data.columns)# Create figurefig,axes=plt.subplots(num_features,num_features,figsize=(10,10))# Loop through each pair of featuresforiinrange(num_features):forjinrange(num_features):ax=axes[i,j]ifi==j:# Plot histogram on the diagonalax.hist(data.iloc[:,i],bins=10,color="skyblue",edgecolor="black")else:# Scatter plotx=data.iloc[:,j]y=data.iloc[:,i]ax.scatter(x,y,alpha=0.7,s=10,color="blue")# Add Regression Linem,b=np.polyfit(x,y,1)# Linear regressionax.plot(x,m*x+b,color="red",linewidth=1)# Labelsifj==0:ax.set_ylabel(data.columns[i],fontsize=10)ifi==num_features-1:ax.set_xlabel(data.columns[j],fontsize=10)# Hide ticks for cleaner lookax.set_xticks([])ax.set_yticks([])# Adjust layoutplt.tight_layout()plt.show()

Output:

output11


Explanation:

  • Data Preparation:Random values are generated for four features usingNumPyandPandas DataFrame stores the dataset.
  • Creating Subplots:A 4×4 grid of subplots is created to display the pairwise relationships. plt.subplots(num_features, num_features, figsize=(10, 10)) sets up the grid layout.
  • Plotting the Pair Plot: If i == j, a histogram is plotted on the diagonal using ax.hist(). If i ≠ j, a scatter plot is created using ax.scatter().
  • Adding Regression Lines:The np.polyfit(x, y, 1) function computes the slope (m) and intercept (b) of the regression line. Theax.plot(x, m*x + b, color="red", linewidth=1)function overlays a red regression line on the scatter plot.
  • Labels are added to only the leftmost and bottom plots. Ticks are hidden for a clean design.
  • plt.tight_layout()ensures proper spacing for readability.

Improve
Improve
Article Tags :

Explore

Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences

[8]ページ先頭

©2009-2025 Movatter.jp