Disclosure of Invention
The invention provides a method and a system for automatically configuring CUDA environment and GPU pressure test, and aims to at least solve one of the technical problems in the prior art.
The technical scheme of the invention relates to a method for automatically configuring CUDA environment and GPU pressure test, which comprises the following steps:
A method for automatically configuring CUDA environment and GPU pressure testing, the method comprising the steps of:
S100, constructing a mapping relation table of a model and architecture of equipment, cuda version and cudnn version;
S200, inputting account information, connecting to enter a Linux server through an SSH protocol according to an account number and a password of a configuration operating system, and transmitting Shell scripts through an SFTP protocol;
S300, acquiring the model of the equipment and returning, inquiring a predefined mapping relation table according to the inquiring requirement of the acquired equipment model to determine the versions of the architecture, cuda and cudnn corresponding to the equipment model, and automatically executing downloading, installation and verification to acquire an automatically configured environment variable;
s400, memory is allocated, a matrix is initialized, a matrix operation algorithm is started, and the calculation power and the error correction capability of the GPU are tested in an all-round mode.
Further, the step S200 includes:
S210, acquiring input information of a front-end web server, wherein the input information comprises an IP address, an account number and a password;
s220, inputting the input information into a back-end server, and starting an SSH service by starting a thread through a Python server;
S230, performing SSH login by using the configured account number and password of the operating system, entering a Linux system through SSH protocol connection, and transmitting an automatic installation script through SFTP protocol.
Further, the step S300 includes:
s310, executing Shell by SSH, connecting to a remote server through SSH, and executing Shell script;
s320, calling exec_command execution command nvidia-smi to output model information in the Shell script;
S330, inquiring a predefined mapping relation table according to the extracted model to obtain CUDA and cuDNN version information corresponding to different models;
S340, automatically downloading and installing CUDA according to the queried CUDA version, and automatically downloading and installing cuDNN according to the queried cuDNN version;
s350, after the installation is completed, automatically configuring environment variables to ensure that CUDA and cuDNN can be correctly identified and used;
s350, checking whether the CUDA environment is configured correctly.
Further, the step S340 includes:
And searching the supported CUDA version range from the predefined mapping relation table, reading the minimum and maximum CUDA version numbers of corresponding entries in the mapping table by setting the internal field separator IFS as a space, and outputting the maximum version number to determine the CUDA version range suitable for the computing architecture.
Further, in the step S340,
Assigning the input cuDNN version number and CUDA version number to local variables cudnn _version and CUDA _version respectively, outputting cuDNN version information being downloaded, constructing cuDNN download links, storing the download links in variables cudnn _url, and downloading the links into a designated file;
Logging and downloading by using a curl command and a cookie file, logging in a NVIDIA developer website by using the curl command, storing cookie information into a cookie_file file, and downloading cuDNN libraries from a constructed link to a designated cudnn _path by using the stored cookie information;
in the install_ cudnn function, the input cuDNN version number and the CUDA version number are respectively assigned to the local variables cudnn _version and CUDA _version, a decompression path of cuDNN is constructed, the downloaded cuDNN compression packet is decompressed to the appointed directory by using the tar command, and the installation of the cuDNN library is completed.
Further, the step S400 includes:
s410, memory is allocated on a host and matrixes A and B are initialized;
s420, randomly injecting some errors into the matrixes A and B;
S430, memory is allocated for the matrixes A and B and the result matrix C on the GPU;
s440, copying matrix data on the host from the host memory to the device memory;
S450, starting a first kernel function of matrix operation to multiply the matrix A and the matrix B, and storing the result in the matrix C;
S460, after matrix operation is completed, randomly injecting some errors into the result matrix C by using a second kernel function;
s460, checking whether an error occurs in CUDA call;
s470, releasing the memory resources on the device and the host.
Further, in the step S420, the randomly injected errors are implemented by injectError functions that add random values to some elements in the matrix with a certain probability.
Further, in the step S400,
Performing matrix element access positioning by calculating the initial row and column col of the current thread block in the global matrix;
Multiplying the y-dimension index blockidx.y of the thread block by the y-dimension size blockdim.y of the thread block, and then adding the y-dimension index wireidx.y of the thread;
Where col is calculated by multiplying the x-dimension index of the thread block, blockidx, by the x-dimension size of the thread block, blockdim, x, and then adding the x-dimension index of the thread, wireidx.
The invention also relates to a computer readable storage medium having stored thereon program instructions which, when executed by a processor, implement the above-mentioned method.
The technical scheme of the invention also relates to an automatic configuration CUDA environment and GPU pressure testing system, which comprises a computer device, wherein the computer device comprises the computer readable storage medium.
The beneficial effects of the invention are as follows:
the automatic configuration CUDA environment and GPU pressure testing method and system realize automatic downloading CUDA and configuration CUDA environments and carry out omnibearing testing on the GPU. The invention realizes the automatic downloading and installation of CUDA and cuDNN and automatically configures the environment variable, ensures that the installed environment is immediately available without manually editing the configuration file by a user, greatly reduces the requirement of manual operation and improves the efficiency and accuracy of the installation process. The GPU pressure testing function is combined, the GPU can be tested in an omnibearing mode, manual damage is introduced to a software layer, the robustness and the error processing capacity of the system are tested, and whether hardware is damaged or not is judged accurately.
Detailed Description
The conception, specific structure, and technical effects produced by the present invention will be clearly and completely described below with reference to the embodiments and the drawings to fully understand the objects, aspects, and effects of the present invention.
It should be noted that, unless otherwise specified, when a feature is referred to as being "fixed" or "connected" to another feature, it may be directly or indirectly fixed or connected to the other feature. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in the description presented herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any combination of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element of the same type from another. For example, a first element could also be termed a second element, and, similarly, a second element could also be termed a first element, without departing from the scope of the present disclosure. The use of any and all examples, or exemplary language (e.g., "such as") provided herein, is intended merely to better illuminate embodiments of the invention and does not pose a limitation on the scope of the invention unless otherwise claimed.
Referring to fig. 1 to 5, in some embodiments, the method for automatically configuring CUDA environment and GPU pressure testing according to the present invention at least includes the following steps:
S100, constructing a mapping relation table of a model and architecture of equipment, cuda version and cudnn version;
S200, inputting account information, connecting to enter a Linux server through an SSH protocol according to an account number and a password of a configuration operating system, and transmitting Shell scripts through an SFTP protocol;
S300, acquiring the model of the equipment and returning, inquiring a predefined mapping relation table according to the inquiring requirement of the acquired equipment model to determine the versions of the architecture, cuda and cudnn corresponding to the equipment model, and automatically executing downloading, installation and verification to acquire an automatically configured environment variable;
s400, memory is allocated, a matrix is initialized, a matrix operation algorithm is started, and the calculation power and the error correction capability of the GPU are tested in an all-round mode.
In some embodiments, referring to fig. 2, the present invention first builds a mapping table of device model and architecture, cuda version, cudnn version, which is updated following the GPU iteration.
In some embodiments, in the account information input processing of the present invention, according to the account number and password of the configuration operating system, the account information is connected to a Linux server through an SSH protocol, and then Shell scripts are transmitted through an SFTP protocol. Referring to fig. 3, input information of a front-end web server is acquired, wherein the input information comprises an IP address, an account number and a password, the input information is transmitted to a back-end server, an SSH service is started by a server start thread of Python, SSH login is performed by using a configured operating system account number and password, the Linux system is accessed through SSH protocol connection, and an automatic installation script is transmitted through SFTP protocol.
Specifically, a block paramiko is first introduced. An SSH client instance named SSH is then created for creating an SSH connection. Next, the host key of the SSH client is set, and the unknown host key is automatically added to the local list of known hosts using paramiko. The remote server is then connected by a connect function, the parameters of which include the remote server's IP address remote_ip, port number remote_port, user name remote_user and password remote_password.
Then, a string variable named env_content is defined for storing the environment variable configuration content. Next, a file named env.sh is opened (or created) and the contents in the env_content variable are written to the file, operating in write mode ("w"). Next, an SFTP client instance is created by ssh.open_sftp (), and assigned to the variable SFTP. And then uploading the local env.sh file to a directory of a remote server. And finally deleting the local env.sh file.
In some embodiments, in the Shell execution of the present invention, the gpu model is determined by the Shell script, and returned to the control end, where the control end queries the architecture and the versions corresponding to cuda and cudnn according to the model, automatically performs downloading, installation and nvcc verification, and obtains the environment configured with cuda after completion. Referring to fig. 4, first, the SSH executes Shell, which connects to a remote server through the SSH, and executes a Shell script. Then, in the Shell script, the exec_command execution command nvidia-smi is called to output model information. And then, according to the extracted model, inquiring a predefined mapping relation table (see fig. 2) to obtain CUDA and cuDNN version information corresponding to different models, automatically downloading and installing the CUDA according to the inquired CUDA version, automatically downloading and installing cuDNN according to the inquired cuDNN version, and after the installation is completed, automatically configuring environment variables to ensure that the CUDA and cuDNN can be correctly identified and used. The nvcc (NVIDIA CUDA Compiler) command is then used to verify that the CUDA environment is properly configured, where if the nvcc command can be successfully executed, it is an indication that the CUDA environment is properly configured.
Specifically, by executing NVIDIA-smi-query-gpu=name-format=csv, noheader, nounits command, model information of the NVIDIA video card in the current system is obtained and stored in the variable gpu_name, and by using the queried video card model, a corresponding computing architecture code is retrieved from a predefined mapping relation table (see fig. 2) and output. Then, according to the retrieved computing architecture code, the supported CUDA version range is searched from a predefined mapping relation table (see fig. 2), the internal field separator IFS is set as a space, the minimum and maximum CUDA version numbers of the corresponding entries in the mapping table are read, and the maximum version numbers are output, so that the CUDA version range suitable for the computing architecture is determined. Then, according to the determined CUDA version number, a cuDNN version number compatible therewith is retrieved from a predefined mapping relationship table (see fig. 2) and output so as to download and install the correct version cuDNN library.
Then, after the input CUDA version number is assigned to the local variable CUDA _version through the defined install_ CUDA function, the CUDA version information being installed is output, a download link of the CUDA installer is constructed, the download link is stored in the variable installer _url, and the link is downloaded into the temporary path file. Then, after downloading the CUDA installation package from the constructed link to the designated installer path using wget command and setting the file as executable rights, the CUDA toolkit is installed in silent mode using sudo command. Then, the incoming cuDNN version number and CUDA version number are assigned to the local variables cudnn _version and CUDA _version, respectively, in the download_ cudnn function, the cuDNN version information being downloaded is output, a download link of cuDNN is constructed, and after the download link is stored in the variable cudnn _url, the download link is downloaded to a specified file. Then, using the curl command to log in and download with the cookie file, after logging in the NVIDIA developer website through the curl command and saving the cookie information to the cookie_file, downloading cuDNN the library from the constructed link to the designated cudnn _path by using the saved cookie information. Then, in the install_ cudnn function, the incoming cuDNN version number and the CUDA version number are respectively assigned to the local variables cudnn _version and CUDA _version, then a decompression path of cuDNN is constructed, and the downloaded cuDNN compressed package is decompressed to the specified directory by using the tar command, so that the installation of the cuDNN library is completed.
Next, a function named configuration_environment is defined in the provided script for configuring CUDA environment variables, which assigns an incoming CUDA version number to the local variable CUDA _version, outputs the PATH of the environment variable being configured, adds the CUDA-related environment variables to the user's bashrc file using an echo command, wherein the environment variables include PATH and LD_LIBRARY_PATH, reloads the bashrc file using source/. Bashrc commands, and takes the environment variables immediately into effect. Then, the main program main function of the script calls the get_gpu_info function to acquire the GPU model, stores it in the variable gpu_name, and outputs the detected GPU model information. Then, call get_gpu_architecture function gets the GPU's computing architecture and stores it in variable architecture, if architecture is not detected, output unknown GPU model and exit the program. Then, the get_ CUDA _version_for_architecture function is called to obtain a recommended CUDA version according to the architecture and store it in the variable recommendable_ CUDA _version while outputting the recommended CUDA version, and the get_ cudnn _version_for_ CUDA function is called to obtain a recommended cuDNN version according to the recommended CUDA version and store it in the variable recommendable_ cudnn _version while outputting the recommended cuDNN version. Then, the install_ CUDA function is called to install the recommended CUDA version, and the download_ cudnn function is called to download the recommended cuDNN version, and the install_ cudnn function is called to install the version. Then, the configuration_environment function configuration environment variable is called, and the installation conditions of CUDA and cuDNN are verified by using nvcc-version and nvidia-smi commands, and information of completion of installation and configuration is output.
It should be noted that, the invention can automatically identify the GPU model and architecture through the functions of the install_ CUDA and the install_ cudnn, download and install the compatible CUDA and cuDNN versions, realize full automation, do not need user intervention, and obviously reduce the operation complexity and error rate. And the environment variables are automatically configured through the configuration_environment function, so that the installed environment is immediately available, and a user does not need to manually edit the configuration file.
In some embodiments, the present invention performs a custom matrix algorithm that includes steps of memory allocation, initializing a matrix, allocating GPU memory, copying data to memory, setting thread blocks and grids, starting matrix multiplication, checking errors, and cleaning resources, so as to comprehensively test the computation power and error correction capability of the GPU, and if the hardware is damaged, the program will report errors. Referring to fig. 4, the steps include:
s410, initializing a matrix. The program first allocates memory on the host and initializes two matrices a and B, where each matrix has a size of N x N, N being the defined matrix size. Further, the size of the matrix in this example is 2048.
S420, injecting errors. To simulate hardware failures or memory corruption, the program randomly injects some errors into the initialized matrices a and B. Wherein the errors are implemented by a injectError function that adds random values to certain elements in the matrix with a certain probability. The probability is defined by INJECTION _rate, which in this example is 10%.
S430, distributing the equipment memory. The program allocates memory on the GPU for matrix A, B and result matrix C.
S440, data transmission. Matrix data on the host is copied from the host memory to the device memory.
S450, executing a matrix multiplication kernel. The program initiates an optimized matrix multiplication kernel function matMulOptimized that uses shared memory to improve computational efficiency. Further, the kernel function multiplies the matrices a and B and stores the result in the matrix C.
S460, simulating memory damage. To further simulate memory corruption, the program uses another kernel simulateMemoryCorruption to randomly inject some errors into the result matrix C after matrix multiplication is complete.
S470, error checking. The program uses cudaGetLastError to check if a CUDA call has an error occurred.
S480, cleaning resources. The program releases memory resources on the device and host.
Specifically, a CUDA kernel function named matMulOptimized is first defined for optimizing matrix multiplication operations, which defines a shared memory array SHAREDMEM for storing sub-block data of matrices a and B during execution of the kernel function. Then, three floating point pointers sharedA, sharedB and C are defined that point to corresponding locations of the sub-block of matrix A, B and result matrix C, respectively, in the shared memory. Then, preparing for subsequent matrix element access positioning by calculating the initial row and column col of the current thread block in the global matrix, wherein row is calculated by multiplying the y-dimension index blockidx.y of the thread block by the y-dimension size blockdim.y of the thread block, then adding the y-dimension index wireadedx.y of the thread, and col is calculated by multiplying the x-dimension index blockidx.x of the thread block by the x-dimension size blockdim.x of the thread block, then adding the x-dimension index wireadeidx.x of the thread.
It should be noted that, by simulating the memory corruption function simulateMemoryCorruption, the present invention can introduce artificial corruption at the software level, and test the robustness and error handling capability of the system.
It should be appreciated that the method steps in embodiments of the present invention may be implemented or carried out by computer hardware, a combination of hardware and software, or by computer instructions stored in non-transitory computer-readable memory. The method may use standard programming techniques. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
Furthermore, the operations of the processes described herein may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes (or variations and/or combinations thereof) described herein may be performed under control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications), by hardware, or combinations thereof, collectively executing on one or more processors. The computer program includes a plurality of instructions executable by one or more processors.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable computing platform, including, but not limited to, a personal computer, mini-computer, mainframe, workstation, network or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and so forth. Aspects of the invention may be implemented in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, an optical read and/or write storage medium, RSM, ROM, etc., such that it is readable by a programmable computer, which when read by a computer, is operable to configure and operate the computer to perform the processes described herein. Further, the machine readable code, or portions thereof, may be transmitted over a wired or wireless network. When such media includes instructions or programs that, in conjunction with a microprocessor or other data processor, implement the steps described above, the invention described herein includes these and other different types of non-transitory computer-readable storage media. The invention may also include the computer itself when programmed according to the methods and techniques of the present invention.
The computer program can be applied to the input data to perform the functions described herein, thereby converting the input data to generate output data that is stored to the non-volatile memory. The output information may also be applied to one or more output devices such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including specific visual depictions of physical and tangible objects produced on a display.
The present invention is not limited to the above embodiments, but can be modified, equivalent, improved, etc. by the same means to achieve the technical effects of the present invention, which are included in the spirit and principle of the present invention. Various modifications and variations are possible in the technical solution and/or in the embodiments within the scope of the invention.