Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ENH: np.unique: support hash based unique for string dtype#28767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
ngoldbaum merged 88 commits intonumpy:mainfrommath-hiyoko:feature/#28364
Jun 20, 2025
Merged
Changes from1 commit
Commits
Show all changes
88 commits
Select commitHold shift + click to select a range
f620f3b
Support NPY_STRING, NPY_UNICODE
math-hiyokoApr 15, 2025
20ccefe
unique for NPY_STRING and NPY_UNICODE
math-hiyokoApr 16, 2025
38626b9
fix construct array
math-hiyokoApr 16, 2025
56bd858
remove unneccessary include
math-hiyokoApr 16, 2025
f79736a
refactor
math-hiyokoApr 16, 2025
c4e5438
refactoring
math-hiyokoApr 17, 2025
7c51049
comment
math-hiyokoApr 17, 2025
bd70552
feature: unique for NPY_VSTRING
math-hiyokoApr 18, 2025
cc8ece6
refactoring
math-hiyokoApr 18, 2025
f7b20a0
remove unneccessary include
math-hiyokoApr 18, 2025
d0170ed
add test
math-hiyokoApr 18, 2025
dbb140f
add error message
math-hiyokoApr 18, 2025
49ed502
linter
math-hiyokoApr 18, 2025
0238cee
linter
math-hiyokoApr 18, 2025
6905978
reserve bucket
math-hiyokoApr 18, 2025
2fc1378
remove emoji from testcase
math-hiyokoApr 18, 2025
1ad6d6c
fix testcase
math-hiyokoApr 18, 2025
b478e15
remove error
math-hiyokoApr 18, 2025
95bc405
fix testcase
math-hiyokoApr 18, 2025
3f1811b
fix testcase name
math-hiyokoApr 18, 2025
99e3662
use basic_string
math-hiyokoApr 18, 2025
b99542a
fix testcase
math-hiyokoApr 18, 2025
2589dd7
add ValueError
math-hiyokoApr 18, 2025
3f40cdc
fix testcase
math-hiyokoApr 18, 2025
68d5a7b
fix memory error
math-hiyokoApr 18, 2025
d38c3e3
remove multibyte char
math-hiyokoApr 18, 2025
8cf2c63
refactoring
math-hiyokoApr 18, 2025
0165d6a
add multibyte char
math-hiyokoApr 18, 2025
243be6b
refactoring
math-hiyokoApr 18, 2025
a6e5d3c
fix memory error
math-hiyokoApr 18, 2025
78b9dc6
fix GIL
math-hiyokoApr 18, 2025
0464617
fix strlen
math-hiyokoApr 18, 2025
908f495
remove PyArray_GETPTR1
math-hiyokoApr 19, 2025
30d1d1a
refactoring
math-hiyokoApr 19, 2025
36c167c
refactoring
math-hiyokoApr 19, 2025
79d31e4
use optional
math-hiyokoApr 19, 2025
00143f9
refactoring
math-hiyokoApr 19, 2025
1cc09f3
refactoring
math-hiyokoApr 19, 2025
b29981d
refactoring
math-hiyokoApr 19, 2025
91c5d42
refactoring
math-hiyokoApr 19, 2025
e9c3aac
fix comment
math-hiyokoApr 19, 2025
8191f5f
linter
math-hiyokoApr 19, 2025
4faf36a
add doc
math-hiyokoApr 19, 2025
c6aaf39
DOC: fix
math-hiyokoApr 19, 2025
1053bcb
DOC: fix format
math-hiyokoApr 20, 2025
1afefbe
MNT: refactoring
math-hiyokoApr 20, 2025
b5610b1
MNT: refactoring
math-hiyokoApr 20, 2025
c28a7ce
ENH: Store pointers to strings in the set instead of the strings them…
math-hiyokoApr 24, 2025
b17011e
FIX: length in memcmp
math-hiyokoApr 24, 2025
c2d5868
ENH: refactoring
math-hiyokoApr 24, 2025
7d4afe0
DOC: 49sec -> 34sec
math-hiyokoApr 24, 2025
ad843b0
Update numpy/lib/_arraysetops_impl.py
math-hiyokoApr 25, 2025
45ec2b3
DOC: Mention that hash-based np.unique returns unsorted strings
math-hiyokoApr 25, 2025
52a982d
Merge branch 'feature/#28364' of github.com:math-hiyoko/numpy into fe…
math-hiyokoApr 25, 2025
fff254e
ENH: support medium and long vstrings
math-hiyokoApr 26, 2025
370bd8f
FIX: comment
math-hiyokoApr 29, 2025
49dfcb4
ENH: use RAII wrapper
math-hiyokoApr 29, 2025
c5745bf
FIX: error handling of string packing
math-hiyokoApr 29, 2025
3ba9788
FIX: error handling of string packing
math-hiyokoApr 29, 2025
376ad09
FIX: change default bucket size
math-hiyokoApr 29, 2025
aa0db48
FIX: include
math-hiyokoApr 30, 2025
7a2892f
FIX: cast
math-hiyokoApr 30, 2025
896bcba
ENH: support equal_nan=False
math-hiyokoMay 1, 2025
f1c1947
FIX: function equal
math-hiyokoMay 1, 2025
f35123a
FIX: check the case if pack_status douesn't return NULL
math-hiyokoMay 1, 2025
e6ea015
FIX: check the case if pack_status douesn't return NULL
math-hiyokoMay 1, 2025
ddff98f
FIX: stderr
math-hiyokoMay 1, 2025
2758e27
ENH: METH_VARARGS -> METH_FASTCALL
math-hiyokoMay 2, 2025
a6dc86a
FIX: log
math-hiyokoMay 2, 2025
9a936eb
FIX: release allocator
math-hiyokoMay 3, 2025
1e967ee
FIX: comment
math-hiyokoMay 3, 2025
52c2326
FIX: delete log
math-hiyokoMay 3, 2025
6f18a43
ENH: implemented FNV-1a as hash function
math-hiyokoMay 3, 2025
2a1bd41
bool -> npy_bool
math-hiyokoMay 3, 2025
8b632f2
FIX: cast
math-hiyokoMay 3, 2025
a7bfc08
34sec -> 35.1sec
math-hiyokoMay 4, 2025
dd0d8f5
Merge branch 'main' into feature/#28364
math-hiyokoMay 21, 2025
9fc9ce3
fix: lint
math-hiyokoMay 21, 2025
998ca00
fix: cast using const void *
math-hiyokoMay 26, 2025
3dd2667
fix: fix fnv1a hash
math-hiyokoJun 1, 2025
94926cb
fix: lint
math-hiyokoJun 1, 2025
a711635
35.1sec -> 33.5sec
math-hiyokoJun 1, 2025
ccccc44
Merge branch 'main' into feature/#28364
math-hiyokoJun 16, 2025
2b6b9b5
enh: define macro HASH_TABLE_INITIAL_BUCKETS
math-hiyokoJun 19, 2025
e92a387
enh: error handling of NpyString_load
math-hiyokoJun 19, 2025
397a594
enh: delete comments on GIL
math-hiyokoJun 19, 2025
425a166
fix: PyErr_SetString when NpyString_load failed
math-hiyokoJun 19, 2025
12eb788
fix: PyErr_SetString -> npy_gil_error
math-hiyokoJun 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
PrevPrevious commit
NextNext commit
DOC: 49sec -> 34sec
  • Loading branch information
@math-hiyoko
math-hiyoko committedApr 24, 2025
commit7d4afe0a2b48b22373b349f72bb0fd5b5d159f22
4 changes: 2 additions & 2 deletionsdoc/release/upcoming_changes/28767.performance.rst
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -3,8 +3,8 @@ Performance improvements to ``np.unique`` for string dtypes
The hash-based algorithm for unique extraction provides
an order-of-magnitude speedup on large string arrays.
In an internal benchmark with about 1 billion string elements,
the hash-based np.unique completed in roughly48 seconds,
the hash-based np.unique completed in roughly34 seconds,
compared to 498 seconds with the sort-based method
– about10–11× faster for unsorted unique operations on strings.
– about14–15× faster for unsorted unique operations on strings.
This improvement greatly reduces the time to find unique values
in very large string datasets.
Loading

[8]ページ先頭

©2009-2025 Movatter.jp