Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Xorgqr#1112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
jprhyne wants to merge43 commits intoReference-LAPACK:master
base:master
Choose a base branch
Loading
fromjprhyne:xorgqr
Draft

Xorgqr#1112

jprhyne wants to merge43 commits intoReference-LAPACK:masterfromjprhyne:xorgqr

Conversation

@jprhyne
Copy link
Contributor

Description
Hey Y'all!

I implementted some performance improvements for computing the explicit Q factor in the X{or,un}gY routines. This takes 2 forms.

  1. I added an xlarfb that takes into account the fact that we always apply to a C whereC_1 is 0. This leads to some improvements in performance as well as on the first iteration assuming thatC_2 is I. This also leads to some performance improvements especially as the block size increases
  2. For the routines X{or,un}gq{rl} where the T matrix returned from Xlarft is the same shape as the triangular factor, we can actually omit the workspace requirements for this routine (aside from calls to X{or,un}2{rl} which still needs that extra vector) which opens the door for some more
    improvements down the line which I am in the process of implementing.
    1. Vendors and users can investigate the blocksize (NB) increasing more freely as we no longer have the concern of memory availability.
    2. This allows for an in place panel factorization which is faster than the unblocked code for the standard blocksize of 32 on the machine I tested this on.

A more formal writeup of this can be found at my repository for myMaster's Project

I compiled the tex file with the current version of pdftex on a linux machine.

In addition, I have attached some performance plots of computing the Q factor from the QR and LQ factorizations in double precision to motivate why I think it will be beneficial to refactor xLARFT to return a T matrix of the same shape as the triangular factor even before implementing the panel factorization. To see the justification of the panel factorization, see the above linked repository folder

dorgqrDorglqOptPerfExperiments.pdf
dorgqrDorglqPerfExperiments.pdf
dorgqrPerfExperiments.pdf

I ran these experiments on an AMD EPYC 7502 CPU and I ran each experiment 10 times reporting the mean of those experiments. To see the form of these experiments, see the files titledtimeDorgqrVsDorglq.c,timeDorgqrVsDorglq.sh, andtime_dorgqr_vs_dorglq.batch foundhere. The .c file is the main driver that calls our FORTRAN routines and times execution, the .sh file calls our .c file with varying inputs, and the .batch file is what is used to run the job via slurm on the HPC I used.

The main takeaway from these figures is that the slight improvements that we see in QR are not present as much in the LQ factorization so even without the more efficient panel factorization.

Checklist

  • The documentation has been updated.
  • If the PR solves a specific issue, it is set to be closed on merge.

langou
langou previously approved these changesMay 7, 2025
Johnathan Rhyne added11 commitsAugust 21, 2025 12:00
…econd pass on documentation for my touched functions
@jprhynejprhyne marked this pull request as draftOctober 23, 2025 21:54
Johnathan Rhyneand others added12 commitsOctober 23, 2025 15:56
Adding non-trivial terminating case to recursive larft
Added missing parameter to clarft_ut call inside clarft
Fixed compilation warnings related to unused parameters
@jprhynejprhyne reopened thisNov 12, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@langoulangoulangou left review comments

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@jprhyne@langou

[8]ページ先頭

©2009-2025 Movatter.jp