Movatterモバイル変換


[0]ホーム

URL:


Open In App
Next Article:
Numpy np.unique() method-Python
Next article icon

Finding unique rows means removing duplicate rows from a 2D array so that each row appears only once.For example, given [[1, 2], [3, 4], [1, 2], [5, 6]], the unique rows would be [[1, 2], [3, 4], [5, 6]]. Let’s explore efficient ways to achieve this usingNumPy.

Using np.unique(axis=0)

np.unique(axis=0)is a simple and beginner-friendly method. It compares entire rows and returns only distinct ones, much like Excel’s "Remove Duplicates." However, it sorts the output, so the original row order is not preserved.

Python
importnumpyasnpa=np.array([[1,2],[3,4],[1,2],[5,6]])res=np.unique(a,axis=0)print(res)

Output
[[1 2] [3 4] [5 6]]

Explanation:Arraya has repeated rows. By applying np.unique(a, axis=0), we ask NumPy to treat each row as a unit and remove duplicates. The result is a sorted array with only unique rows.

Using np.lexsort()

For more control over deduplication, np.lexsort() sorts rows to group duplicates, thennp.diff()filters them out. Like sorting papers by name and removing repeats, it’s more flexible than np.unique but requires extra steps.

Python
importnumpyasnpa=np.array([[7,8],[7,9],[6,8],[7,8]])s=np.lexsort(a.T[::-1])a=a[s]mask=np.ones(len(a),bool)mask[1:]=np.any(np.diff(a,axis=0),axis=1)res=a[mask]print(res)

Output
[[6 8] [7 8] [7 9]]

Explanation: We first sort rows using np.lexsort() on reversed columns to ensure correct sort order. Then,np.diff()finds differences between consecutive rows. A boolean mask identifies the first unique row in each group.

Using set(tuple(row))

set(tuple(...)) converts each row to a tuple and uses a set to remove duplicates. It’s quick and easy but doesn’t preserve row order and may have issues with floating-point precision. Think of it as dropping rows into labeled baskets, duplicates simply won’t fit twice.

Python
importnumpyasnpa=np.array([[0,0],[1,1],[0,0],[2,2]])res=np.array(list({tuple(r)forrina}))print(res)

Output
[[1 1] [2 2] [0 0]]

Explanation: Each row in arraya is turned into a tuple using a set comprehension. Sets automatically discard duplicates. Finally, we convert the set of tuples back to a NumPy array.

Using view(np.void)

view(np.void) treats each row as a single byte block, enabling fast row-wise comparison. It’s less readable but highly efficient for large datasets. Row order can be preserved by sorting indices.

Python
importnumpyasnpa=np.array([[1,2,3],[4,5,6],[1,2,3],[7,8,9]])b=np.ascontiguousarray(a).view(np.dtype((np.void,a.dtype.itemsize*a.shape[1])))_,idx=np.unique(b,return_index=True)res=a[np.sort(idx)]print(res)

Output
[[1 2 3] [4 5 6] [7 8 9]]

Explanation: We convert a to a contiguous byte block usinga.view(np.void), letting NumPy treat each row as a single unit. Thennp.unique() finds unique byte rows and we use the indices to retrieve the original unique rows.


Similar Reads

We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood ourCookie Policy &Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences

[8]ページ先頭

©2009-2025 Movatter.jp