Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork11.5k
Description
Describe the issue:
I have a Windows on ARM(12 core) setup. I installed Python 3.13.7 and then installed NumPy (v2.3.0) using the command: pip install numpy.
After that, I ran a simple benchmarking script:
import numpy as np
import time
import os
N = 10000
A = np.random.rand(N, N)
B = np.random.rand(N, N)
start = time.time()
C = np.dot(A, B)
end = time.time()
print(f"Matrix multiplication of {N}x{N} took {end - start:.2f} seconds")
I noticed that it utilizes only 4 cores, even after setting OPENBLAS_NUM_THREADS to a value greater than 4. It still uses only 4 threads. However, if I set it to a value below 4, it behaves as expected like scheduling that much thread.
I check on amd64 device as well, there it is working as expected it is using all core there, for same workload.
Any idea what could be limitation here?
Reproduce the code example:
importnumpyasnpimporttimeimportosN=10000A=np.random.rand(N,N)B=np.random.rand(N,N)start=time.time()C=np.dot(A,B)end=time.time()print(f"Matrix multiplication of{N}x{N} took{end-start:.2f} seconds")
Error message:
Python and NumPy Versions:
Python 3.13.7 and Numpy 2.3.0
Runtime Environment:
No response
Context for the issue:
No response