Installation troubleshooting
Contents
Installation troubleshooting#
2025-10-30
7 min read time
Troubleshooting describes issues that some users encounter when installing the ROCm tools or libraries.
Issue #1: Installation methods#
As an example, the latest version of ROCm is 6.0.2, but the installation instructions result in release 6.0.0 being installed.
Solution: You may have used the quick-start installation method which only installs the latest major release. Use one of the other available installation methods:
Quick-start installation - Installs only the latestmajor release (i.e. 6.0.0, or 6.1.0)
Native package manager install method - Installs the specifiedmajor and minor release version (i.e. 6.0.0, 6.0.2)
Refer toROCm Issue #2422 for additional details.
Issue #2: Install prerequisites#
When installing, I see the following message:Problem:nothingprovidesperl-URI-Encodeneededtobeinstalledby...
Solution: Ensure that theInstallation prerequisites are installed. There are prerequisite PERL packages required for SUSE. RHEL also requires Extra Packages for Enterprise Linux (EPEL) to be installed, which is also mentioned in prerequisites. Be sure to install those first, then repeat your installation steps.
Refer toROCm Issue #1827.
Issue #3: PATH variable#
After successfully installing ROCm, when I runrocminfo (or another ROCm tool) the command is not found.
Solution: You may need to update yourPATH environment variable as described inPost-installation instructions.
Refer toROCm Issue #1607.
Issue #4: C++ libraries#
When compiling HIP programs, I get a linking error for-lstdc++, orfatalerror:'cmath'filenotfound.
Solution: You can install C++ libraries using your package manager. The following is an Ubuntu example:
sudoapt-getinstalllibstdc++-<gcc-version>-dev
For more information on how to determine the relevantgcc-version, refer toROCm Issue #1843.
Issue #5: Application hangs on Multi-GPU systems#
Running on a system with multiple GPUs the application hangs with the GPU use at 100%, but without the expected GPU temperature buildup
This issue often results in the following message in the application transcript:
NCCLWARNMissing"iommu=pt"fromkernelcommandlinewhichcanleadtosysteminstablityorhang!
Solution: To resolve this issue addiommu=pt toGRUB_CMDLINE_LINUX_DEFAULT in/etc/default/grub. Then run the following command:
sudoupdate-grub
Reboot the system, and run the following command:
cat/proc/cmdline
The returned information should reflect the addition ofiommu:
BOOT_IMAGE=/vmlinuz-5.15.0-101-genericroot=/dev/mapper/ubuntu--vg-ubuntu--lvroiommu=pt
Refer toRCCL Issue #1129 for more information.
Issue #6: Additional packages for Docker installations#
Docker images often come with minimal installations, meaning some essential packages might be missing. When installing ROCm within a Docker container, you might need to install additional packages for a successful ROCm installation. Use the following commands to install the prerequisite packages.
aptupdateaptinstallsudowgetgpg
aptupdateaptinstallsudowgetgpg
dnfinstallsudowget
dnfinstallsudowget
zypperinstallsudowgetSUSEConnectawk
dnfinstallsudowget
After installing these packages, install ROCm using theQuick start installation guide in your Docker container.
Issue #7: Installations using Python wheels (.whl files) do not support soft links#
If you have installed ROCm or any ROCm component using a Python wheel (.whl file), runninga ROCm command which is soft-linked will fail withnotfound on Ubuntu,badinterpreter:Nosuchfileordirectory on SLES, andModuleNotFoundError on RHEL.
Solution: Python wheel files do not support soft links (symbolic links). You will need to run soft-linked commands from within their installation directories, or using the full path to their locations.
For example, runrocm-smi on ROCm 6.2 in the following way:
cd/opt/rocm-6.2.0/libexec/rocm_smi/python3rocm_smi.pyor
python3/opt/rocm-6.2.0/libexec/rocm_smi/rocm_smi.py
SeeSymbolic links in wheels for more information.
Issue #8: The AMDGPU driver is not loaded after installation#
When you are verifying the ROCm installation according to thepost-install instructions,therocm-smi androcminfo commands might fail with the error messageDrivernotinitialized or not display any output. This could indicatethe AMDGPU driver is not loaded.
Solution: Ensure the AMDGPU driver is not on a denylist such as/etc/modprobe.d/blacklist-amdgpu.conf.The location of this file might vary depending on the system distribution and version.To verify whether the driver is on a denylist, use the following command:
grepamdgpu/etc/modprobe.d/*
Note
When installing the AMDGPU driver with Secure Boot enabled, you must signamdgpu-dkms to prevent potential system loading issues.For more information, seeSecure Boot Support.If you prefer not to sign the AMDGPU driver, you can disable Secure Boot from the BIOS settings instead.
Issue #9: Cannot access the AMD GPU after installation#
If the group permissions are not set properly during ROCm installation,you might get an error similar toPermissiondenied when attempting to access the AMD GPU.
Solution: You must be part of thevideo andrender groups to access the AMD GPU.To learn how to add an account to these groups, seeConfiguring permissions for GPU access.
Issue #10: ROCm debugging tools might become unresponsive in SELinux-enabled distributions#
Red Hat Enterprise Linux (RHEL) and related distributions automatically enable a security feature named Security-Enhanced Linux (SELinux) that may prevent ROCm debugging tools like ROCgdb, ROCdbgapi, and ROCR Debug Agent from working correctly.
The problem occurs when attempting to debug a program that contains code that runs on the GPU. The debugging session may become unresponsive while attempting to reach a breakpoint or doing instruction-stepping in device code. ROCgdb will still be responsive and accept interruption by pressingControl+C, but the breakpoint in device code won’t be hit, and the instruction-stepping operation will not conclude.
The ROCR Debug Agent might also become unresponsive when attempting to capture data from a program that is running into queue errors, memory faults, and other triggering events.
As a workaround for this problem, either disable SELinux or configure it to use the permissive setting.
While ROCgdb or ROCR Debug Agent are being used, setting SELinux to permissive can be accomplished with the following command:
sudosetenforce0After the session is over, it can be switched back to enforcing mode:
sudosetenforce1Note
Changing the SELinux settings can have security implications. Ensure you review your system security settings before making any changes.