Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Global scan#665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
kpentaris wants to merge62 commits intoDevsh-Graphics-Programming:master
base:master
Choose a base branch
Loading
fromkpentaris:global_scan

Conversation

@kpentaris
Copy link
Contributor

Description

Initial implementation of the Global Scan in HLSL

Testing

Not yet finished

TODO list:

Still need to test the HLSL code as well as the porting of the example code to the new API of vulkan_1_3 branch

@devshgraphicsprogrammingjenkins
Copy link
Contributor

[CI]: Can one of the admins verify this patch?

Comment on lines 12 to 14
#ifndef NBL_BUILTIN_MAX_SCAN_LEVELS
#define NBL_BUILTIN_MAX_SCAN_LEVELS7
#endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

do asNBL_CONSTEXPR instead of define

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

also this better be formulated slightly differently, in terms of reduction passes, not reduction + downsweep

Comment on lines 25 to 26
uint32_t topLevel;
uint32_t temporaryStorageOffset[NBL_BUILTIN_MAX_SCAN_LEVELS/2];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

use uint16_t for this

uint32_t temporaryStorageOffset[NBL_BUILTIN_MAX_SCAN_LEVELS/2];
};

Parameters_tgetParameters();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

no need for such a forward decl anymore

Comment on lines 31 to 38
struct DefaultSchedulerParameters_t
{
uint32_t finishedFlagOffset[NBL_BUILTIN_MAX_SCAN_LEVELS-1];
uint32_t cumulativeWorkgroupCount[NBL_BUILTIN_MAX_SCAN_LEVELS];

};

DefaultSchedulerParameters_tgetSchedulerParameters();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

split out schedulers into separate headers

Comment on lines 40 to 54
template<typename Storage_t>
voidgetData(
inoutStorage_t data,
inuint levelInvocationIndex,
inuint localWorkgroupIndex,
inuint treeLevel,
inuint pseudoLevel
NBL_REF_ARG(Storage_t) data,
NBL_CONST_REF_ARG(uint32_t) levelInvocationIndex,
NBL_CONST_REF_ARG(uint32_t) localWorkgroupIndex,
NBL_CONST_REF_ARG(uint32_t) treeLevel,
NBL_CONST_REF_ARG(uint32_t) pseudoLevel
);
}
}
}
#define _NBL_HLSL_SCAN_GET_PADDED_DATA_DECLARED_
#endif

#ifndef _NBL_HLSL_SCAN_SET_DATA_DECLARED_
namespace nbl
{
namespace hlsl
{
namespace scan
{
template<typename Storage_t>
voidsetData(
inStorage_t data,
inuint levelInvocationIndex,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

no more forward declarations, just let it be an accessor.

Also thelevel const-ref-args can be rolled up into a single structSDataIndex

Comment on lines 0 to 45
// Copyright (C) 2023 - DevSH Graphics Programming Sp. z O.O.
// This file is part of the "Nabla Engine".
// For conditions of distribution and use, see copyright notice in nabla.h

#ifndef _NBL_HLSL_SCAN_DESCRIPTORS_INCLUDED_
#define _NBL_HLSL_SCAN_DESCRIPTORS_INCLUDED_

// choerent -> globallycoherent
#include"nbl/builtin/hlsl/scan/declarations.hlsl"
#include"nbl/builtin/hlsl/workgroup/basic.hlsl"

// coherent -> globallycoherent

namespace nbl
{
namespace hlsl
{
namespace scan
{

template<uint32_t scratchElementCount=scratchSz>// (REVIEW): This should be externally defined. Maybe change the scratch buffer to RWByteAddressBuffer? Annoying to manage though...
struct Scratch
{
uint32_t workgroupsStarted;
uint32_t data[scratchElementCount];
};

[[vk::binding(0 ,0)]]RWStructuredBuffer<uint32_t/*Storage_t*/> scanBuffer;// (REVIEW): Make the type externalizable. Decide how (#define?)
[[vk::binding(1 ,0)]]RWStructuredBuffer<Scratch>globallycoherent scanScratchBuf;// (REVIEW): Check if globallycoherent can be used with Vulkan Mem Model

template<typename Storage_t,bool isExclusive=false>
voidgetData(
NBL_REF_ARG(Storage_t) data,
NBL_CONST_REF_ARG(uint32_t) levelInvocationIndex,
NBL_CONST_REF_ARG(uint32_t) localWorkgroupIndex,
NBL_CONST_REF_ARG(uint32_t) treeLevel,
NBL_CONST_REF_ARG(uint32_t) pseudoLevel
)
{
const Parameters_t params =getParameters();// defined differently for direct and indirect shaders

uint32_t offset = levelInvocationIndex;
constbool notFirstOrLastLevel =bool(pseudoLevel);
if (notFirstOrLastLevel)
offset += params.temporaryStorageOffset[pseudoLevel-1u];

Copy link
Member

@devshgraphicsprogrammingdevshgraphicsprogrammingApr 2, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

remove this whole file, it should be userspace

Comment on lines 13 to 41
// TODO: Can we make it a static variable?
groupshared uint32_t wgScratch[SharedScratchSz];

#include"nbl/builtin/hlsl/workgroup/arithmetic.hlsl"

template<uint16_t offset>
struct WGScratchProxy
{
uint32_tget(const uint32_t ix)
{
return wgScratch[ix+offset];
}
voidset(const uint32_t ix,const uint32_t value)
{
wgScratch[ix+offset] = value;
}

uint32_tatomicAdd(uint32_t ix, uint32_t val)
{
return glsl::atomicAdd(wgScratch[ix + offset], val);
}

voidworkgroupExecutionAndMemoryBarrier()
{
nbl::hlsl::glsl::barrier();
//nbl::hlsl::glsl::memoryBarrierShared(); implied by the above
}
};
static WGScratchProxy<0> accessor;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

scratches are userspace

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

use accessors

Comment on lines 55 to 64
/**
* Required since we rely on SubgroupContiguousIndex instead of
* gl_LocalInvocationIndex which means to match the global index
* we can't use the gl_GlobalInvocationID but an index based on
* SubgroupContiguousIndex.
*/
uint32_tglobalIndex()
{
return nbl::hlsl::glsl::gl_WorkGroupID().x*WORKGROUP_SIZE+nbl::hlsl::workgroup::SubgroupContiguousIndex();
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

we have this in a header already

Comment on lines +46 to +53
struct ScanPushConstants
{
nbl::hlsl::scan::Parameters_t scanParams;
nbl::hlsl::scan::DefaultSchedulerParameters_t schedulerParams;
};

[[vk::push_constant]]
ScanPushConstants spc;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

everything affecting the pipeline layout should be userspace

Comment on lines -40 to +38
#ifndef _NBL_HLSL_MAIN_DEFINED_
[numthreads(_NBL_HLSL_WORKGROUP_SIZE_,1,1)]
voidCSMain()
[numthreads(WORKGROUP_SIZE,1,1)]
voidmain()
{
if (bool(nbl::hlsl::scan::getIndirectElementCount()))
nbl::hlsl::scan::main();
if(bool(nbl::hlsl::scan::getIndirectElementCount())) {
// TODO call main from virtual_workgroup.hlsl
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

all this should be userspace, also there will be very little difference between direct and indirect

{
template<class Binop,class Storage_t>
voidvirtualWorkgroup(inuinttreeLevel,inuintlocalWorkgroupIndex)
template<class Binop,typename Storage_t,bool isExclusive, uint16_t ItemCount,class Accessor,class device_capabilities=void>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

why not template on the workgroup scan instead?

Within you can alias and extract:

  • binop
  • storage_t
  • exclusive or not
  • item count
  • smem accessor
  • device traits necessary

Comment on lines 22 to 23
const uint32_t levelInvocationIndex = localWorkgroupIndex * glsl::gl_WorkGroupSize().x +SubgroupContiguousIndex();
constbool lastInvocationInGroup =SubgroupContiguousIndex() == (gl_WorkGroupSize().x -1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

shouldn'tItemCount be used instead ofglsl::gl_WorkGroupSize().x ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

most definitely

Comment on lines +91 to +98
// could do scanScratchBuf[0u].workgroupsStarted[SubgroupContiguousIndex()] = 0u but don't know how many invocations are live during this call
if(workgroup::SubgroupContiguousIndex() == 0u)
{
for(uint32_t i =0; i < params.topLevel; i++)
{
scanScratchBuf[0u].workgroupsStarted[i] = 0u;
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

don't you know the workgroup size?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

My bad, this is actually not used anymore. I think it was being used inside aninRage at some point? Not sure. However we can't reset theworkgroupStarted buffer within the shader at the end of the Reduce step (which was the initial goal) because it's possible that there are still workgroups that are "alive" where they won't be doing any work but will still increase theworkgroupStarted buffer and some times we end up doing the resetbefore those WGs exit, ending up with some 1 values instead of all 0s.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@devshgraphicsprogrammingdevshgraphicsprogrammingdevshgraphicsprogramming left review comments

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@kpentaris@devshgraphicsprogrammingjenkins@devshgraphicsprogramming

[8]ページ先頭

©2009-2025 Movatter.jp