NotificationsYou must be signed in to change notification settings
Fork483
Star1.8k

Xorgqr#1112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Draft

jprhyne wants to merge43 commits intoReference-LAPACK:master

base:master

Choose a base branch

fromjprhyne:xorgqr

Draft

Xorgqr#1112

jprhyne wants to merge43 commits intoReference-LAPACK:masterfromjprhyne:xorgqr

Conversation

Copy link

Contributor

jprhyne commentedMar 8, 2025

Description
Hey Y'all!

I implementted some performance improvements for computing the explicit Q factor in the X{or,un}gY routines. This takes 2 forms.

I added an xlarfb that takes into account the fact that we always apply to a C whereC_1 is 0. This leads to some improvements in performance as well as on the first iteration assuming thatC_2 is I. This also leads to some performance improvements especially as the block size increases
For the routines X{or,un}gq{rl} where the T matrix returned from Xlarft is the same shape as the triangular factor, we can actually omit the workspace requirements for this routine (aside from calls to X{or,un}2{rl} which still needs that extra vector) which opens the door for some more
improvements down the line which I am in the process of implementing.
1. Vendors and users can investigate the blocksize (NB) increasing more freely as we no longer have the concern of memory availability.
2. This allows for an in place panel factorization which is faster than the unblocked code for the standard blocksize of 32 on the machine I tested this on.

A more formal writeup of this can be found at my repository for myMaster's Project

I compiled the tex file with the current version of pdftex on a linux machine.

In addition, I have attached some performance plots of computing the Q factor from the QR and LQ factorizations in double precision to motivate why I think it will be beneficial to refactor xLARFT to return a T matrix of the same shape as the triangular factor even before implementing the panel factorization. To see the justification of the panel factorization, see the above linked repository folder

dorgqrDorglqOptPerfExperiments.pdf
dorgqrDorglqPerfExperiments.pdf
dorgqrPerfExperiments.pdf

I ran these experiments on an AMD EPYC 7502 CPU and I ran each experiment 10 times reporting the mean of those experiments. To see the form of these experiments, see the files titledtimeDorgqrVsDorglq.c,timeDorgqrVsDorglq.sh, andtime_dorgqr_vs_dorglq.batch foundhere. The .c file is the main driver that calls our FORTRAN routines and times execution, the .sh file calls our .c file with varying inputs, and the .batch file is what is used to run the job via slurm on the HPC I used.

The main takeaway from these figures is that the slight improvements that we see in QR are not present as much in the LQ factorization so even without the more efficient panel factorization.

Checklist

The documentation has been updated.
If the PR solves a specific issue, it is set to be closed on merge.

jprhyne added14 commits

February 5, 2025 10:21

double tests pass. TODO: Run single tests for slarfb0c2

6043063

Merge branch 'Reference-LAPACK:master' into xorgqr

e883033

Real version of xorgqr using new larfb implemented and passing tests …

09a04d7

…locally

double complex tests pass locally

338bae2

single complex implementation also pass local tests. TODO: check gith…

2b21ac7

…ub tests

Merge branch 'Reference-LAPACK:master' into xorgqr

5d52a6e

Merge branch 'Reference-LAPACK:master' into xorgqr

31333f6

Adding documentation to my new functions

46dd9bd

Merge branch 'xorgqr' of github.com:jprhyne/lapack into xorgqr

01099fe

Updating documentation for x{or,un}gy to account for workspace changes

87c19d1

fixed whitespace inconsistencies in subroutine definitions

318a43d

added definitions to build with _64 using cmake

bf84397

fixed inconsistent function definitiom of CLARFB0C2

181b0cc

adding a dropped comma

4e3bad0

langou previously approved these changes

May 7, 2025

View reviewed changes

jprhyneand others added3 commits

June 7, 2025 15:22

Merge branch 'Reference-LAPACK:master' into xorgqr

07dbb8b

Merge branch 'Reference-LAPACK:master' into xorgqr

a644e25

New panel factorization implemented into dorgx family and pass local …

9a87361

…tests. Check CI before moving to other precisions

jprhyne dismissedlangou’sstale review via9a87361

August 21, 2025 17:54

Johnathan Rhyne added11 commits

August 21, 2025 12:00

fixed calls in dlarft exceding column 72 when compiling with -DLAPACK64

1793565

Fixed other calls exceding column 72 with -DLAPACK_64 flag and correc…

5b1b111

…ted comments in dorgx family

(hopefully) Fixed all errors associated with _64 suffixes. CMAKE and …

4957464

…make work successfully on my local machine

removed extraneous file

33825bc

single precision passes local tests

c0b1856

both complex precisions implemented and pass local tests. And did a s…

d2e360f

…econd pass on documentation for my touched functions

implementing a level2 BLAS terminating case for dlarft. TODO: Impleme…

6727c5b

…nt for other precisions

tidied up dlarft.f comments

c482508

Added documentation to dlarft2.f

b4c0c82

single precision real pass local tests

9053f7d

double precision case pases all local tests

1fd275d

Johnathan Rhyne added3 commits

October 15, 2025 20:29

single precision complex pass local tests. TODO: check CI tests

c4c8077

single precision complex passes local tests. TODO: run CI tests

37004aa

updating file names of terminating case for larft2 to be compatible w…

b7832af

…ith most recent merge in lapack

jprhyne marked this pull request as draft

October 23, 2025 21:54

Johnathan Rhyneand others added12 commits

October 23, 2025 15:56

updated file names in build system files

127eb98

Merge pull request#3from jprhyne/dlarftnx

3a25492

Adding non-trivial terminating case to recursive larft

finished dst3rk and sequential base case. todo: implement dlarft_ut

7a4e03b

Added nx computation for ilaenv

8813ea9

added larft_ut and relevant helper subroutines in all precisions

8416368

Merging my UT based larft branch into main PR branch

380e1a4

Dlarft ut

Merge branch 'master' into xorgqr

f8837fe

fixed seg fault issue in clarft

eaede85

Merge pull request#5from jprhyne/dlarftUT

3c414c1

Added missing parameter to clarft_ut call inside clarft

Fixed compilation warnings related to unused parameters

258e876

Merge branch 'xorgqr' into dlarftUT

7ee73de

Merge pull request#6from jprhyne/dlarftUT

447a21a

Fixed compilation warnings related to unused parameters

jprhyne closed this

Nov 12, 2025

jprhyne reopened this

Nov 12, 2025

Labels

None yet

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Xorgqr#1112

Are you sure you want to change the base?

Xorgqr#1112

Uh oh!

Conversation

jprhyne commentedMar 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants