NotificationsYou must be signed in to change notification settings
Fork70
Star363

Split ir nodes into multiple translation units#5596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

wujingyue wants to merge1 commit intomain

base:main

Choose a base branch

fromwjy/files

Open

Split ir nodes into multiple translation units#5596

wujingyue wants to merge1 commit intomainfromwjy/files

+1,732 −1,696

Conversation

Copy link

Collaborator

wujingyue commentedNov 27, 2025

split monolithic csrc/ir/nodes.cpp into internal_nodes.cpp and internal_base_nodes.cpp matching header declarations
cleaned includes with IWYU-style reductions and added required CUDA/runtime headers where needed
updated build and verified via _bn

Split ir nodes into multiple translation units

e915dc6

Copy link

github-actionsbot commentedNov 27, 2025

Description

Split monolithic csrc/ir/nodes.cpp into internal_nodes.cpp and internal_base_nodes.cpp for better code organization
Moved IterDomainBuilder, IterDomain, and TensorDomain implementations to new internal_base_nodes.cpp file
Cleaned up includes with IWYU-style reductions and added required CUDA/runtime headers where needed
Updated CMakeLists.txt to compile the new split translation units

Changes walkthrough

Relevant files

Enhancement

internal_nodes.cpp `Cleaned up internal nodes implementation file` csrc/ir/internal_nodes.cpp Split from original nodes.cpp with cleaned includes Added ATen and CUDA headers as needed Removed unused includes like device_lower/lower2device.h Updated to include internal_nodes.h instead of interface_nodes.h	+10/-1695
internal_base_nodes.cpp `New base nodes implementation file` csrc/ir/internal_base_nodes.cpp New file containing IterDomainBuilder, IterDomain, and TensorDomain implementations Moved base node classes from original nodes.cpp Added proper includes for STL containers and IR utilities Contains 1720 lines of implementation code	+1720/-0

Configuration changes

CMakeLists.txt `Updated build configuration for split files` CMakeLists.txt Replaced nodes.cpp with internal_base_nodes.cpp and internal_nodes.cpp Updated build system to compile split translation units	+2/-1

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 Security concerns No - This is a code reorganization with no functional changes.
⚡ Recommended focus areas for review
Include Dependencies Verify that all necessary headers are included. The file uses many IR utilities and operations (IrBuilder, IterDomainBuilder, etc.) but only includes a subset of headers. Check that no implicit dependencies exist that could cause linking issues. // clang-format off/* * SPDX-FileCopyrightText: Copyright (c) 2023-present NVIDIA CORPORATION & AFFILIATES. * All rights reserved. * SPDX-License-Identifier: BSD-3-Clause/// clang-format on#include<algorithm>#include<iterator>#include<list>#include<numeric>#include<optional>#include<ranges>#include<sstream>#include<string>#include<unordered_map>#include<unordered_set>#include<utility>#include<vector>#include<ir/cloner.h>#include<ir/internal_base_nodes.h>#include<ir/iostream.h>#include<ir/utils.h>#include<ops/arith.h>#include<transform_rfactor.h>#include<transform_view.h>#include<type.h>namespacenvfuser {IterDomainBuilder::IterDomainBuilder(Val _start, Val* _extent) : start_(_start), extent_(_extent) {NVF_ERROR( start_ !=nullptr && extent_ !=nullptr,"Start and extent are required to build an iter domain.");}IterDomainBuilder::IterDomainBuilder(const IterDomain* id) : start_(id->start()), extent_(id->extent()), expanded_extent_( id->hasExpandedExtent() ? id->expandedExtent() : nullptr), stop_offset_(id->stopOffset()), parallel_type_(id->getParallelType()), iter_type_(id->getIterType()), is_rfactor_domain_(id->isRFactorProduct()), is_padded_dimension_(id->hasPaddingToMultipleOfWarp()), is_clustered_dimension_(id->isClusteredBlockDim()), padded_to_size_(id->getMaybeSizeAfterPadding()) {} Header Cleanup Completeness The old includes were significantly reduced. While this follows IWYU principles, ensure that all removed headers weren't providing transitive dependencies that other parts of the codebase rely on. The removed headers like lower2device.h, dynamic_transform.h, and others may be needed by other modules. #include<numeric>#include<sstream>#include<string>#include<unordered_map>#include<ATen/Functions.h>#include<ATen/TensorIndexing.h>#include<ATen/TensorOptions.h>#include<ATen/Utils.h>#include<ATen/core/Tensor.h>#include<ATen/cuda/CUDAContextLight.h>#include<c10/core/SymInt.h>#include<c10/cuda/CUDAGraphsC10Utils.h>#include<torch/nn/functional/embedding.h>#include<torch/nn/options/embedding.h>#include<device_lower/utils.h>#include<expr_evaluator.h>#include<ir/cloner.h>#include<ir/internal_nodes.h>#include<ir/iostream.h>#include<ir/utils.h>#include<kernel.h>#include<kernel_ir.h>#include<logical_domain_map.h>#include<ops/arith.h>#include<runtime/allocations.h>#include<transform_iter.h>#include<transform_rfactor.h>#include<transform_view.h>

Copy link

Contributor

greptile-appsbot commentedNov 27, 2025

Greptile Overview

Greptile Summary

Refactored monolithiccsrc/ir/nodes.cpp (6531 lines) into two focused translation units matching the existing header hierarchy:internal_base_nodes.cpp (1720 lines) forIterDomain andTensorDomain implementations, andinternal_nodes.cpp (4846 lines) for remaining IR node operations.

Key improvements:

Reduced include dependencies ininternal_base_nodes.cpp by removing unnecessary CUDA/ATen headers
Added required CUDA runtime headers (ATen/Functions.h,ATen/TensorIndexing.h, etc.) tointernal_nodes.cpp where needed
UpdatedCMakeLists.txt to replace singleir/nodes.cpp with two new translation units
Maintained all functionality with clean separation following existing header structure (internal_base_nodes.h vs internal_nodes.h)

The split aligns with the documented IR hierarchy and should improve build times through better dependency isolation and parallel compilation.

Confidence Score: 5/5

This PR is safe to merge - it's a straightforward refactoring that splits a large file into logical units
Score of 5 reflects that this is a pure refactoring with no functional changes, proper build system updates, appropriate include pruning and addition, and alignment with existing header structure
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
CMakeLists.txt	5/5	Replaced single`ir/nodes.cpp` with two new translation units`ir/internal_base_nodes.cpp` and`ir/internal_nodes.cpp` to split monolithic file
csrc/ir/internal_base_nodes.cpp	5/5	New translation unit containing`IterDomain` and`TensorDomain` implementations (~1720 lines) with reduced include dependencies
csrc/ir/internal_nodes.cpp	5/5	New translation unit containing remaining IR node implementations (~4846 lines) including ops like`FullOp`,`SelectOp`, etc. with necessary CUDA/ATen headers

Sequence Diagram

sequenceDiagram    participant Build as Build System    participant Old as nodes.cpp<br/>(6531 lines)    participant New1 as internal_base_nodes.cpp<br/>(1720 lines)    participant New2 as internal_nodes.cpp<br/>(4846 lines)    participant Headers as Header Files    Note over Old: Original monolithic file    Note over Old,Headers: Split based on header structure    Old->>New1: Extract IterDomain implementations    Old->>New1: Extract TensorDomain implementations    Note over New1: Base node types<br/>Reduced includes    Old->>New2: Extract FullOp, SelectOp, etc.    Old->>New2: Extract remaining IR nodes    Note over New2: Higher-level ops<br/>CUDA/ATen includes    Headers->>New1: #include internal_base_nodes.h    Headers->>New2: #include internal_nodes.h        New1->>Build: Register translation unit    New2->>Build: Register translation unit    Build->>Build: Remove old nodes.cpp    Build->>Build: Add internal_base_nodes.cpp    Build->>Build: Add internal_nodes.cpp