NVIDIA/ccclPublic

NotificationsYou must be signed in to change notification settings
Fork345
Star2.2k

Comments

Make`_CCCL_GLOBAL_CONSTANTinline` when RDC is enabled#7682

Open

davebayer wants to merge 1 commit intoNVIDIA:mainfrom

davebayer:make_cpos_weak_in_rdc

Open

Make`_CCCL_GLOBAL_CONSTANTinline` when RDC is enabled#7682
davebayer wants to merge 1 commit intoNVIDIA:mainfrom
davebayer:make_cpos_weak_in_rdc

Conversation

Copy link

Contributor

davebayer commentedFeb 16, 2026

When compiling libcu++, it puts significant amount of objects (mostly CPOs) with static linkage to the global memory. This PR makes the symbolsinline (.weak) when RDC is enabled.

Demo:https://godbolt.org/z/8ofPq8Wo1

Make_CCCL_GLOBAL_CONSTANTinline when RDC is enabled

732bb80

davebayer requested a review froma team as acode owner

February 16, 2026 08:42

github-project-automationbot added this toCCCL

Feb 16, 2026

davebayer requested a review fromgriwes

February 16, 2026 08:42

github-project-automationbot moved this toTodo inCCCL

Feb 16, 2026

cccl-authenticator-appbot moved this fromTodo toIn Review inCCCL

Feb 16, 2026

davebayer commented

Feb 16, 2026

View reviewed changes

libcudacxx/include/cuda/std/__cccl/dialect.h

Comment on lines 114 to 122

		#if _CCCL_DEVICE_COMPILATION() && !_CCCL_CUDA_COMPILER(NVHPC)
		# define _CCCL_GLOBAL_CONSTANT _CCCL_DEVICE constexpr
		# if _CCCL_HAS_RDC()
		# define _CCCL_GLOBAL_CONSTANT _CCCL_DEVICE inline constexpr
		# else // ^^^ _CCCL_HAS_RDC() ^^^ / vvv !_CCCL_HAS_RDC() vvv
		# define _CCCL_GLOBAL_CONSTANT _CCCL_DEVICE constexpr
		# endif // ^^^ !_CCCL_HAS_RDC() ^^^
		#else // ^^^ _CCCL_DEVICE_COMPILATION() && !_CCCL_CUDA_COMPILER(NVHPC) ^^^ /
		// vvv !_CCCL_DEVICE_COMPILATION() \|\| _CCCL_CUDA_COMPILER(NVHPC) vvv
		# define _CCCL_GLOBAL_CONSTANT inline constexpr

Copy link

ContributorAuthor

davebayerFeb 16, 2026•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@miscco Why do we treat nvhpc differently? The objects don't live in device memory, if we don't make them__device__ with nvhpc

Copy link

Contributor

misccoFeb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The reason is that NVHPC has a single compilation pass, so it will see that the global object is used on device and instantiate it there

Copy link

ContributorAuthor

davebayerFeb 16, 2026•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@dkolsen-pgi could you tell us what's exactly happening under the hood?

What I observe is that if I set the execution space of a CPO to:

nothing it seems that the object is not part of the generated PTX and if I do&cpo in device code, it returnnullptr
__device__, the object is visible in the PTX.

Seehttps://ce.nvidia.com/z/7d77Kj. I'm not sure whether this is a problem, but it's not consistent with what other compilers produce. Because with all other compilers, we basically create the object twice - 1x on__host__ and 1x on__device__

Copy link

Contributor

dkolsen-pgiFeb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

NVHPC's handling of constexpr variables has bugs. I can't fully explain what you are seeing.

The basic cases work well when the constexpr variable doesn't have any__host__ or__device__ annotations:

A constexpr scalar variable whose address is never needed will be constant folded in both host and device code. An actual object won't be generated in either host or device because it isn't needed.
A CPO or other stateless function object can be called just fine in both host and device code. The object isn't created correctly in device code, but it doesn't matter because the object is never actually accessed.

The thing that doesn't work is when the constexpr variable has a meaningful value and that value is accessed via a pointer to the object. Using it in host code is fine, but using it in device code usually won't compile, with nvc++ complaining about accessing a global variable from device code. If the constexpr variable is explicitly marked__device__, it will work correctly in device code, but wrong results might happen in host code.

We haven't fixed these problems because it's not easy to do and the existing behavior works well enough for the code that nvc++ needs to compile.

I thinkconstexpr inline, without any__device__ is generally the best thing to do for NVHPC.

Copy link

Contributor

github-actionsbot commentedFeb 16, 2026

🥳 CI Workflow Results

🟩 Finished in 3h 54m: Pass: 100%/95 | Total: 4d 12h | Max: 3h 54m | Hits: 16%/249655

See resultshere.

Copy link

Collaborator

jrhemstad commentedFeb 16, 2026

question: Have we convinced ourselves this isn't going to cause insidious ODR issues?

davebayer self-assigned this

Feb 17, 2026

Labels

None yet

4 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Make`_CCCL_GLOBAL_CONSTANTinline` when RDC is enabled#7682

Make`_CCCL_GLOBAL_CONSTANTinline` when RDC is enabled#7682
davebayer wants to merge 1 commit intoNVIDIA:mainfrom
davebayer:make_cpos_weak_in_rdc

Conversation

davebayer commentedFeb 16, 2026

Uh oh!

davebayerFeb 16, 2026•
edited
Loading

Uh oh!

Choose a reason for hiding this comment

Uh oh!

misccoFeb 16, 2026

Choose a reason for hiding this comment

Uh oh!

davebayerFeb 16, 2026•
edited
Loading

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkolsen-pgiFeb 16, 2026

Choose a reason for hiding this comment

Uh oh!

github-actionsbot commentedFeb 16, 2026

🥳 CI Workflow Results

🟩 Finished in 3h 54m: Pass: 100%/95 | Total: 4d 12h | Max: 3h 54m | Hits: 16%/249655

Uh oh!

jrhemstad commentedFeb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Movatterモバイル変換

Comments

Conversation

davebayer commentedFeb 16, 2026

Uh oh!

davebayerFeb 16, 2026• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

misccoFeb 16, 2026

Choose a reason for hiding this comment

Uh oh!

davebayerFeb 16, 2026• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkolsen-pgiFeb 16, 2026

Choose a reason for hiding this comment

Uh oh!

github-actionsbot commentedFeb 16, 2026

🥳 CI Workflow Results

🟩 Finished in 3h 54m: Pass: 100%/95 | Total: 4d 12h | Max: 3h 54m | Hits: 16%/249655

Uh oh!

jrhemstad commentedFeb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

davebayerFeb 16, 2026•
edited
Loading

davebayerFeb 16, 2026•
edited
Loading