Movatterモバイル変換


[0]ホーム

URL:


Menu
×
Sign In
+1 Get Certified For Teachers Spaces Plus Get Certified For Teachers Spaces Plus
   ❮     
     ❯   

Python Tutorial

Python HOMEPython IntroPython Get StartedPython SyntaxPython CommentsPython VariablesPython Data TypesPython NumbersPython CastingPython StringsPython BooleansPython OperatorsPython ListsPython TuplesPython SetsPython DictionariesPython If...ElsePython MatchPython While LoopsPython For LoopsPython FunctionsPython LambdaPython ArraysPython OOPPython Classes/ObjectsPython InheritancePython IteratorsPython PolymorphismPython ScopePython ModulesPython DatesPython MathPython JSONPython RegExPython PIPPython Try...ExceptPython String FormattingPython User InputPython VirtualEnv

File Handling

Python File HandlingPython Read FilesPython Write/Create FilesPython Delete Files

Python Modules

NumPy TutorialPandas TutorialSciPy TutorialDjango Tutorial

Python Matplotlib

Matplotlib IntroMatplotlib Get StartedMatplotlib PyplotMatplotlib PlottingMatplotlib MarkersMatplotlib LineMatplotlib LabelsMatplotlib GridMatplotlib SubplotMatplotlib ScatterMatplotlib BarsMatplotlib HistogramsMatplotlib Pie Charts

Machine Learning

Getting StartedMean Median ModeStandard DeviationPercentileData DistributionNormal Data DistributionScatter PlotLinear RegressionPolynomial RegressionMultiple RegressionScaleTrain/TestDecision TreeConfusion MatrixHierarchical ClusteringLogistic RegressionGrid SearchCategorical DataK-meansBootstrap AggregationCross ValidationAUC - ROC CurveK-nearest neighbors

Python DSA

Python DSALists and ArraysStacksQueuesLinked ListsHash TablesTreesBinary TreesBinary Search TreesAVL TreesGraphsLinear SearchBinary SearchBubble SortSelection SortInsertion SortQuick SortCounting SortRadix SortMerge Sort

Python MySQL

MySQL Get StartedMySQL Create DatabaseMySQL Create TableMySQL InsertMySQL SelectMySQL WhereMySQL Order ByMySQL DeleteMySQL Drop TableMySQL UpdateMySQL LimitMySQL Join

Python MongoDB

MongoDB Get StartedMongoDB Create DBMongoDB CollectionMongoDB InsertMongoDB FindMongoDB QueryMongoDB SortMongoDB DeleteMongoDB Drop CollectionMongoDB UpdateMongoDB Limit

Python Reference

Python OverviewPython Built-in FunctionsPython String MethodsPython List MethodsPython Dictionary MethodsPython Tuple MethodsPython Set MethodsPython File MethodsPython KeywordsPython ExceptionsPython Glossary

Module Reference

Random ModuleRequests ModuleStatistics ModuleMath ModulecMath Module

Python How To

Remove List DuplicatesReverse a StringAdd Two Numbers

Python Examples

Python ExamplesPython CompilerPython ExercisesPython QuizPython ServerPython SyllabusPython Study PlanPython Interview Q&APython BootcampPython CertificatePython Training

Machine Learning - Hierarchical Clustering


Hierarchical Clustering

Hierarchical clustering is an unsupervised learning method for clustering data points. The algorithm builds clusters by measuring the dissimilarities between data. Unsupervised learning means that a model does not have to be trained, and we do not need a "target" variable. This method can be used on any data to visualize and interpret the relationship between individual data points.

Here we will use hierarchical clustering to group data points and visualize the clusters using both a dendrogram and scatter plot.


How does it work?

We will use Agglomerative Clustering, a type of hierarchical clustering that follows a bottom up approach. We begin by treating each data point as its own cluster. Then, we join clusters together that have the shortest distance between them to create larger clusters. This step is repeated until one large cluster is formed containing all of the data points.

Hierarchical clustering requires us to decide on both a distance and linkage method. We will use euclidean distance and the Ward linkage method, which attempts to minimize the variance between clusters.

Example

Start by visualizing some data points:

import numpy as np
import matplotlib.pyplot as plt

x = [4, 5, 10, 4, 3, 11, 14 , 6, 10, 12]
y = [21, 19, 24, 17, 16, 25, 24, 22, 21, 21]

plt.scatter(x, y)
plt.show()

Result

Run example »

Now we compute the ward linkage using euclidean distance, and visualize it using a dendrogram:

Example

import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage

x = [4, 5, 10, 4, 3, 11, 14 , 6, 10, 12]
y = [21, 19, 24, 17, 16, 25, 24, 22, 21, 21]

data = list(zip(x, y))

linkage_data = linkage(data, method='ward', metric='euclidean')
dendrogram(linkage_data)

plt.show()

Result

Run example »
Here, we do the same thing with Python's scikit-learn library. Then, visualize on a 2-dimensional plot:

Example

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import AgglomerativeClustering

x = [4, 5, 10, 4, 3, 11, 14 , 6, 10, 12]
y = [21, 19, 24, 17, 16, 25, 24, 22, 21, 21]

data = list(zip(x, y))

hierarchical_cluster = AgglomerativeClustering(n_clusters=2, affinity='euclidean', linkage='ward')
labels = hierarchical_cluster.fit_predict(data)

plt.scatter(x, y, c=labels)
plt.show()

Result

Run example »

Example Explained

Import the modules you need.

import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
from sklearn.cluster import AgglomerativeClustering

You can learn about the Matplotlib module in our"Matplotlib Tutorial.

You can learn about the SciPy module in ourSciPy Tutorial.

NumPy is a library for working with arrays and matricies in Python,you can learn about the NumPy module in ourNumPy Tutorial.

scikit-learn is a popular library for machine learning.

Create arrays that resemble two variables in a dataset. Note that while we only use two variables here, this method will work with any number of variables:

x = [4, 5, 10, 4, 3, 11, 14 , 6, 10, 12]
y = [21, 19, 24, 17, 16, 25, 24, 22, 21, 21]

Turn the data into a set of points:

data = list(zip(x, y))
print(data)

Result:

[(4, 21), (5, 19), (10, 24), (4, 17), (3, 16), (11, 25), (14, 24), (6, 22), (10, 21), (12, 21)]

Compute the linkage between all of the different points. Here we use a simple euclidean distance measure and Ward's linkage, which seeks to minimize the variance between clusters.

linkage_data = linkage(data, method='ward', metric='euclidean')

Finally, plot the results in a dendrogram. This plot will show us the hierarchy of clusters from the bottom (individual points) to the top (a single cluster consisting of all data points).

plt.show() lets us visualize the dendrogram instead of just the raw linkage data.

dendrogram(linkage_data)
plt.show()

Result:

The scikit-learn library allows us to use hierarchichal clustering in a different manner. First, we initialize theAgglomerativeClustering class with 2 clusters, using the same euclidean distance and Ward linkage.

hierarchical_cluster = AgglomerativeClustering(n_clusters=2, affinity='euclidean', linkage='ward')

The.fit_predict method can be called on our data to compute the clusters using the defined parameters across our chosen number of clusters.

labels = hierarchical_cluster.fit_predict(data)print(labels)

Result:

[0 0 1 0 0 1 1 0 1 1]

Finally, if we plot the same data and color the points using the labels assigned to each index by the hierarchical clustering method, we can see the cluster each point was assigned to:

plt.scatter(x, y, c=labels)
plt.show()

Result:


 
Track your progress - it's free!
 

×

Contact Sales

If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail:
sales@w3schools.com

Report Error

If you want to report an error, or if you want to make a suggestion, send us an e-mail:
help@w3schools.com

W3Schools is optimized for learning and training. Examples might be simplified to improve reading and learning.
Tutorials, references, and examples are constantly reviewed to avoid errors, but we cannot warrant full correctness
of all content. While using W3Schools, you agree to have read and accepted ourterms of use,cookie and privacy policy.

Copyright 1999-2025 by Refsnes Data. All Rights Reserved.W3Schools is Powered by W3.CSS.


[8]ページ先頭

©2009-2025 Movatter.jp