NotificationsYou must be signed in to change notification settings
Fork338
Star4k

Fix using BLAS for all compatible cases of memory layout#1419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

bluss merged 8 commits intomasterfromblas-layout

Aug 8, 2024

Merged

Fix using BLAS for all compatible cases of memory layout#1419

bluss merged 8 commits intomasterfromblas-layout

Aug 8, 2024

Conversation

Copy link

Member

bluss commentedAug 7, 2024•
edited
Loading

With the blas (cblas) interface it supports matrices that adhere to certain
criteria. They should be contiguous on one dimension (stride=1).

We glance a littleat how numpy does this to try to catch all cases.

Compute A B -> C:
We require for BLAS compatibility that: A, B, C are
"weakly" contiguous (stride=1) in their fastest dimension, but it can be
either first or second axis (either rowmajor/"c" or colmajor/"f").

The "normal case" is CblasRowMajor for cblas. Select CblasRowMajor /
CblasColMajor to fit C's memory order.

Apply transpose to A, B as needed if they differ from row major. If C
is CblasColMajor then transpose both A, B (again!)

(Weakly = contiguous with stride=1 on that fastest axis, but stride for the
other axis can be arbitrary large; to differentiate from strictly whole
array contiguous.)

A first commit simplified and corrected the logic, while still using
ndarray's reversed axes. But a further commit simplified it even further, to
a satisfying little function inmat_mul_impl as the final result.

I have kept both states (both commits) because I think the first version is
a useful guide if we would ever go to use plain BLAS instead of CBLAS(?).

Fixes#1278

bluss force-pushed theblas-layout branch 4 times, most recently from4e25c2c to248109dCompare

August 7, 2024 11:44

bluss added this to the0.16.x milestone

Aug 7, 2024

bluss force-pushed theblas-layout branch 3 times, most recently frombe22336 to5c8b9deCompare

August 7, 2024 13:56

bluss changed the title~~Fix using BLAS for all possible cases (of memory layout)~~Fix using BLAS for all compatible cases (of memory layout)

Aug 7, 2024

bluss changed the title~~Fix using BLAS for all compatible cases (of memory layout)~~Fix using BLAS for all compatible cases of memory layout

Aug 7, 2024

bluss added8 commits

August 8, 2024 19:58

blas-tests: Fix to use blas feature

9f1b35d

Lost in the recent workspace refactor.

ndarray-gen: Add simple internal interface for building matrices

2ca801c

blas: Update layout logic for gemm

27e347c

We compute A B -> C with matrices A, B, CWith the blas (cblas) interface it supports matrices that adhere tocertain criteria. They should be contiguous on one dimension (stride=1).We glance a little at how numpy does this to try to catch all cases.In short, we accept A, B contiguous on either axis (row or columnmajor). We use the case where C is (weakly) row major, but if it iscolumn major we transpose A, B, C => A^t, B^t, C^t so that we are backto the C row major case.(Weakly = contiguous with stride=1 on that inner dimension, but stridefor the other dimension can be larger; to differentiate from strictlywhole array contiguous.)Minor change to the gemv function, no functional change, only updatingdue to the refactoring of blas layout functions.Fixes#1278

blas: Fix to skip array with too short stride

01bb218

If we have a matrix of dimension say 5 x 5, BLAS requires the leadingstride to be >= 5. Smaller cases are possible for read-only array viewsin ndarray(broadcasting and custom strides).In this case we mark the array as not BLAS compatible

ci: Run ndarray tests with feature blas

56cac34

tests: Refactor to use ArrayBuilder more places

e65bd0d

blas: Simplify layout logic for gemm

b2955cb

Using cblas we can simplify this further to a more satisfyingtranslation (from ndarray to BLAS), much simpler logic.Avoids creating and handling an extra layer of array views.

blas: Test that matrix multiply calls BLAS

844cfcb

Add a crate with a mock blas implementation, so that we can assert thatcblas_sgemm etc are called (depending on memory layout).

bluss force-pushed theblas-layout branch fromd992a13 to844cfcbCompare

August 8, 2024 17:58

bluss added the blas label

Aug 8, 2024

bluss merged commitf563af0 intomaster

Aug 8, 2024

12 checks passed

bluss deleted the blas-layout branch

August 8, 2024 18:10

bluss mentioned this pull request

Aug 9, 2024

Refactor and simplify BLAS gemm call further#1421

Merged

Labels

blas

1 participant

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix using BLAS for all compatible cases of memory layout#1419

Fix using BLAS for all compatible cases of memory layout#1419

Uh oh!

Conversation

bluss commentedAug 7, 2024•
edited
Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Movatterモバイル変換

Fix using BLAS for all compatible cases of memory layout#1419

Fix using BLAS for all compatible cases of memory layout#1419

Uh oh!

Conversation

bluss commentedAug 7, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bluss commentedAug 7, 2024•
edited
Loading