Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Data layout transformations on objectfifo join/split output/input#2706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
AndraBisca wants to merge12 commits intomain
base:main
Choose a base branch
Loading
fromof-dims

Conversation

@AndraBisca
Copy link
Collaborator

@AndraBiscaAndraBisca commentedNov 12, 2025
edited
Loading

Following discussion in#2678 this PR introduces an example in which the outputs of three compute tiles are joined in a mem tile before the final 48-element wide tensor (i32) is sent to external memory.

In this example, two iterations of the join pattern are required to move the 48-element wide output data tensor from the NPU to external memory. In combination with thetoStream data layout transformation on the 48-element wide data, the following BD programming is required:

%memtile_dma_0_1 = aie.memtile_dma(%mem_tile_0_1) {      %0 = aie.dma_start(MM2S, 0, ^bb1, ^bb6)    ^bb1:  // 2 preds: ^bb0, ^bb3      aie.use_lock(%out_cons_lock_2, AcquireGreaterEqual, 1)      aie.dma_bd(%out_buff_0 : memref<24xi32>, 0, 24, [<size = 8, stride = 1>, <size = 3, stride = 8>])      aie.use_lock(%out_prod_lock_2, Release, 1)      aie.next_bd ^bb2    ^bb2:  // pred: ^bb1      aie.use_lock(%out_cons_lock_1, AcquireGreaterEqual, 1)      aie.dma_bd(%out_buff_0 : memref<24xi32>, 0, 0, [<size = 8, stride = 1>, <size = 3, stride = 8>])      aie.use_lock(%out_prod_lock_1, Release, 1)      aie.next_bd ^bb3    ^bb3:  // pred: ^bb2      aie.use_lock(%out_cons_lock_0, AcquireGreaterEqual, 1)      aie.dma_bd(%out_buff_0 : memref<24xi32>, 0, 0, [<size = 8, stride = 1>, <size = 3, stride = 8>])      aie.use_lock(%out_prod_lock_0, Release, 1)      aie.next_bd ^bb1    ^bb6:  // pred: ^bb0      %1 = aie.dma_start(S2MM, 0, ^bb7, ^bb8)    ^bb7:  // 2 preds: ^bb6, ^bb7      aie.use_lock(%out_prod_lock_0, AcquireGreaterEqual, 1)      aie.dma_bd(%out_buff_0 : memref<24xi32>, 0, 8)      aie.use_lock(%out_cons_lock_0, Release, 1)      aie.next_bd ^bb7    ^bb8:  // pred: ^bb6      %2 = aie.dma_start(S2MM, 1, ^bb9, ^bb10)    ^bb9:  // 2 preds: ^bb8, ^bb9      aie.use_lock(%out_prod_lock_1, AcquireGreaterEqual, 1)      aie.dma_bd(%out_buff_0 : memref<24xi32>, 8, 8)      aie.use_lock(%out_cons_lock_1, Release, 1)      aie.next_bd ^bb9    ^bb10:  // pred: ^bb8      %3 = aie.dma_start(S2MM, 2, ^bb11, ^bb12)    ^bb11:  // 2 preds: ^bb10, ^bb11      aie.use_lock(%out_prod_lock_2, AcquireGreaterEqual, 1)      aie.dma_bd(%out_buff_0 : memref<24xi32>, 16, 8)      aie.use_lock(%out_cons_lock_2, Release, 1)      aie.next_bd ^bb11    ^bb12:  // pred: ^bb10      aie.end    }

The objectfifo lowering for a join currently only works at the granularity of the smaller tensors, and thus cannot apply the data layout transformation on the final output tensor. This PR enhances the lowering such that the pattern above is produced instead. This is similar for the distribute pattern using the fromStream data layout transformation on the input objectfifo.

TODO:

  • comment and cleanup code in objectfifo lowering
  • debug distribute with fromStream on input objfifo
  • add checks for AIE2 architecture: multiple acq/rel ops should not be allowed in the same BD
  • add documentation
  • add MLIR examples

@github-actions
Copy link
Contributor

github-actionsbot commentedNov 12, 2025
edited
Loading

Coverage Report

Created: 2025-11-22 08:29

Clickhere for information about interpreting this report.

FilenameFunction CoverageLine CoverageRegion CoverageBranch Coverage
home/runner/work/mlir-aie/mlir-aie/lib/Dialect/AIE/Transforms/AIEObjectFifoStatefulTransform.cpp 100.00% 94.47% 92.22% 86.70%
Totals 100.00% 94.47% 92.22% 86.70%
Generated by llvm-cov -- llvm version 18.1.3

@fifield
Copy link
Collaborator

Following discussion in#2678 this PR introduces an example which tests whether multiple locks can be acquired and released in a single DMA BD.

Maybe it's obvious, but the hardware does not support this. It should be an error.

AndraBisca reacted with thumbs up emoji

@AndraBiscaAndraBisca changed the titleLimitation of multiple acq/rel ops in one DMA BDSupport data layout transformation on objectfifo join outputNov 17, 2025
@AndraBiscaAndraBisca changed the titleSupport data layout transformation on objectfifo join outputSupport data layout transformations on objectfifo join outputNov 17, 2025
@AndraBiscaAndraBisca changed the titleSupport data layout transformations on objectfifo join outputData layout transformations on objectfifo join/split output/inputNov 18, 2025
if (j >= OUT_HEIGHT / 2)
ref++;
if (*(bufOut + i + OUT_WIDTH * j) != ref) {
std::cout << "Error in output " << i + OUT_WIDTH * j << ": " << *(bufOut + i + OUT_WIDTH * j) << " != " << ref
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

[clang-format]reported byreviewdog 🐶

Suggested change
std::cout <<"Error in output" << i + OUT_WIDTH * j <<":" << *(bufOut + i + OUT_WIDTH * j) <<" !=" << ref
std::cout <<"Error in output" << i + OUT_WIDTH * j <<":"
<< *(bufOut + i + OUT_WIDTH * j) <<" !=" << ref

<< std::endl;
errors++;
} else {
std::cout << "Correct output " << i + OUT_WIDTH * j << ": " << *(bufOut + i + OUT_WIDTH * j) << " == " << ref
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

[clang-format]reported byreviewdog 🐶

Suggested change
std::cout <<"Correct output" << i + OUT_WIDTH * j <<":" << *(bufOut + i + OUT_WIDTH * j) <<" ==" << ref
std::cout <<"Correct output" << i + OUT_WIDTH * j <<":"
<< *(bufOut + i + OUT_WIDTH * j) <<" ==" << ref

# Input
of_offsets = [8 * worker for worker in range(n_workers)]

of_in = ObjectFifo(tile24_ty, depth=depth, name="in", dims_from_stream_per_cons=[(8, 3), (3, 1)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

[black]reported byreviewdog 🐶

Suggested change
of_in=ObjectFifo(tile24_ty,depth=depth,name="in",dims_from_stream_per_cons=[(8,3), (3,1)])
of_in=ObjectFifo(
tile24_ty,depth=depth,name="in",dims_from_stream_per_cons=[(8,3), (3,1)]
)

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@github-actionsgithub-actions[bot]github-actions[bot] left review comments

@denolfdenolfAwaiting requested review from denolfdenolf will be requested when the pull request is marked ready for reviewdenolf is a code owner

@jgmelberjgmelberAwaiting requested review from jgmelberjgmelber will be requested when the pull request is marked ready for reviewjgmelber is a code owner

@jackl-xilinxjackl-xilinxAwaiting requested review from jackl-xilinxjackl-xilinx will be requested when the pull request is marked ready for reviewjackl-xilinx is a code owner

@andrejandrejAwaiting requested review from andrejandrej will be requested when the pull request is marked ready for reviewandrej is a code owner

@hunhoffehunhoffeAwaiting requested review from hunhoffehunhoffe will be requested when the pull request is marked ready for reviewhunhoffe is a code owner

@stephenneuendorfferstephenneuendorfferAwaiting requested review from stephenneuendorfferstephenneuendorffer will be requested when the pull request is marked ready for reviewstephenneuendorffer is a code owner

@fifieldfifieldAwaiting requested review from fifieldfifield will be requested when the pull request is marked ready for reviewfifield is a code owner

@erwei-xilinxerwei-xilinxAwaiting requested review from erwei-xilinxerwei-xilinx will be requested when the pull request is marked ready for reviewerwei-xilinx is a code owner

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@AndraBisca@fifield@abisca

[8]ページ先頭

©2009-2025 Movatter.jp