Disclosure of Invention
Accordingly, the present invention is directed to a GPU-based SLAM control method, apparatus, and storage medium, which can reduce redundant computation and increase the operation speed of SLAM algorithm. The specific scheme is as follows:
 A GPU-based SLAM control method, comprising:
 carrying out error prediction on each frame of input image information to obtain a predicted value;
 judging whether the predicted value is smaller than a set threshold value or not;
 If yes, in the process of running the SLAM algorithm, only a plurality of representative warp are started, the branch directions of all threads in the representative warp are recorded, and the recorded branch directions are transmitted to the adjacent warp.
Preferably, in the above-mentioned GPU-based SLAM control method provided by the embodiment of the present invention, before starting only representative warp, the method further includes:
 Before the image is built, carrying out target recognition on each input frame image, only reserving a pixel area occupied by a recognized object, transmitting the reserved pixel information in the pixel area to an SLAM system, and discarding the rest through judgment sentences.
Preferably, in the above-mentioned GPU-based SLAM control method provided by the embodiment of the present invention, the method further includes:
 A 256-bit bool-type array is created in shared memory for each thread of a representative warp to accommodate the branching information of 256 voxels executed by the thread.
Preferably, in the above GPU-based SLAM control method provided by the embodiment of the present invention, after transmitting the recorded branching direction to the adjacent warp, the method further includes:
 At the time of mapping, the last twenty to thirty iterative operations are skipped during program compilation.
Preferably, in the above GPU-based SLAM control method provided by the embodiment of the present invention, after skipping the last twenty to thirty iterative operations during program compiling, the method further includes:
 after the current image frame information is fused into a TSDF map, when the surface information is extracted from the TSDF map, only partial threads are started, the started partial threads are normally executed, and after the execution is finished, the result is copied into a shared memory.
The embodiment of the invention also provides electronic equipment, which comprises a processor and a memory, wherein the processor realizes the SLAM control method based on the GPU when executing the computer program stored in the memory.
The embodiment of the invention also provides a computer readable storage medium for storing a computer program, wherein the computer program realizes the SLAM control method based on the GPU provided by the embodiment of the invention when being executed by a processor.
From the above technical solution, the GPU-based SLAM control method provided by the present invention includes: carrying out error prediction on each frame of input image information to obtain a predicted value; judging whether the predicted value is smaller than a set threshold value or not; if yes, in the process of running the SLAM algorithm, only a plurality of representative warp are started, the branch directions of all threads in the representative warp are recorded, and the recorded branch directions are transmitted to the adjacent warp.
The invention starts from the SLAM algorithm itself or the characteristic of running on the GPU, and performs approximate calculation processing by recording the branch direction of one warp to replace the branch direction of other warp in SLAM or other applications, thereby solving branch divergence phenomena, reducing redundant calculation, improving the running speed of SLAM, realizing the refined control of SLAM error to keep the SLAM error below an acceptable maximum error value all the time, and exploring the prospect and opportunity of applying the approximate calculation technology to SLAM such high-precision application. In addition, the invention also provides corresponding equipment and a computer readable storage medium for the SLAM control method based on the GPU, so that the method has more practicability, and the equipment and the computer readable storage medium have corresponding advantages.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides a SLAM control method based on a GPU, as shown in FIG. 1, comprising the following steps:
 S101, carrying out error prediction on each frame of input image information to obtain a predicted value;
 It can be appreciated that, due to the high positioning accuracy requirements of SLAM systems, SLAM algorithms themselves can produce certain errors. In order to measure the accuracy of the SLAM algorithm, it is generally required to calculate the absolute value of the deviation of the running track from the actual running track, which is also called the absolute value of the average track error (Mean absolutely trajectory error, mean ATE). The approximation calculation technique (Approximate Computing Technique) is a method of achieving a significant increase in the operation speed while the operation result of the target application remains within an acceptable range by approximation calculation. The GPU is used as an important SLAM deployment platform, a large number of approximate computers exist in the SLAM operation process, so that the invention designs a corresponding approximate calculation method aiming at the characteristic that the SLAM operates on the GPU, the operation speed of the SLAM can be remarkably improved, and the real-time performance can be realized on equipment with less calculation power, such as an embedded GPU platform. However, if the error generated by the SLAM algorithm itself exceeds the prescribed requirement, it is not suitable to use the approximation technique because the approximation calculation further increases the error. Therefore, the original error of the system needs to be predicted before adopting an approximate calculation method;
 S102, judging whether the predicted value is smaller than a set threshold value;
 If not, normally operating the SLAM algorithm; if yes, go to step S103;
 S103, in the process of running the SLAM algorithm, only a plurality of representative warp are started, the branch directions of all threads in the representative warp are recorded, and the recorded branch directions are transmitted to the warp adjacent to the representative warp. It is emphasized that this step may be referred to as redundant branch cancellation (redundant branch elimination, RBE for short).
If the predicted value is greater than the set threshold, no processing is performed on the frame, and the SLAM algorithm is normally operated. If the predicted value is smaller than the prescribed value, it is indicated that an approximation calculation method may be employed, and the approximation calculation method in step S103 refers to the redundant branch cancellation method RBE. Correspondingly, the design space of approximate calculation is defined value minus predicted value, RBE is adopted to perform further approximate calculation, and the aim is to ensure that the system is maintained within the allowable error range and realize the quick operation of SLAM. RBE utilizes SLAM when GPU platform operation adjacent warp's branch direction highly similar characteristic, only need start several representative warp when GPU carries out, record the branch direction of all threads in these warp, pass on these directions to adjacent warp with it. Therefore, adjacent warp does not need to make branch judgment, and the branch direction of the adjacent warp is consistent with that of the representative warp.
In the above-mentioned SLAM control method based on GPU provided by the embodiment of the invention, starting from the SLAM algorithm itself or from the characteristic that the SLAM algorithm runs on the GPU, a redundant branch elimination method is applied to the kernel function of the SLAM algorithm, and in SLAM or other applications, approximate calculation processing is performed by recording the branch direction of one warp to replace the branch direction of other warp, so that the problem branch divergence is solved, the operation bottleneck of the SLAM algorithm itself is further solved, redundant calculation is reduced, the running speed of the SLAM is improved, the refined control of SLAM error is realized so that the SLAM error is always kept below an acceptable maximum error value, and the prospect and opportunity of applying an approximate calculation technology to SLAM such high-precision application are explored.
Further, in implementation, in the above-mentioned GPU-based SLAM control method provided by the embodiment of the present invention, as shown in fig. 2, before executing step S103 to start only representative warp, the method may further include the following steps:
 S104, before the image is built, carrying out target recognition on each input frame image, only reserving a pixel area occupied by a recognized object, transmitting pixel information in the reserved pixel area to an SLAM system, and discarding the rest through judgment sentences. It is emphasized that this step may be referred to as critical data identification method (CRITICAL DATA identification, CDI for short).
It should be noted that, the visual SLAM adopts the camera to complete the sensing work of the environment, so all the positioning and mapping work need to rely on the pixel information in the photo shot by the camera, and as the image itself contains a large amount of noise information and information of irrelevant objects, the information not only increases the calculation amount, but also can affect the positioning precision and mapping effect of the SLAM. In order to solve the problem, the invention uses the CDI approximation calculation technology to process the input image of the SLAM system (filter image information) by utilizing an object identification algorithm (such as mobilenet, yolo), only the pixel area occupied by the object/object needing to be mapped is reserved and the reserved pixel information of the area is transferred to the SLAM system. The specific workflow of CDI may be: for the rgb image or the rgb-d image with the depth value shot by the camera, yolo v is adopted for target recognition, and the coordinate information of the recognized object, namely the object to be reserved, is saved, wherein the information is the coordinate values of two points on the diagonal line of the target recognition rectangular frame where the object is positioned. When the image is read, the subsequent image building step only provides the information of the pixels in each identification frame, and the rest of the judgment statement is discarded. The object area to be identified is typically small, about thirty percent of the image area, and this approach can theoretically reduce the amount of computation by seventy percent.
In a specific implementation, in the above-mentioned GPU-based SLAM control method provided by the embodiment of the present invention, step S103 may further include: a 256-bit bool-type array is created in shared memory for each thread of a representative warp to accommodate the branching information of 256 voxels executed by the thread.
In a specific implementation, in the above-mentioned GPU-based SLAM control method according to the embodiment of the present invention, as shown in fig. 2, after executing step S103 to transfer the recorded branching direction to the adjacent warp, the method may further include the following steps:
 S105, skipping the last twenty to thirty iterative operations during program compiling when mapping. It is emphasized that this step may be referred to as a loop-skip method (loop skipping, abbreviated LS).
It should be noted that, in the mapping process, the SLAM often uses a TSDF (measured SIGNED DISTANCE Function) map to store the reconstructed surface information. The structure consists of a large number of small squares (pixels), each representing a point in real space. In the process of building the image, each small square is required to be operated, and whether object information to be reconstructed exists in the actual physical space corresponding to the square is judged according to the pose matrix of the camera and the acquired image, which is usually completed by a cyclic operation with large iteration times. In these loop operations, the last twenty to thirty iterations are often unnecessary because none of the information of the surface to be reconstructed is stored in the voxel that these iterations determine. LS takes advantage of this feature by skipping the last twenty to thirty iterative operations during program compilation.
Further, in the specific implementation, when the approximate calculation degree of the LS method is D1, namely, when the skip frequency is D1, the main loop iteration frequency after being processed by the LS method is 256-D1.
In a specific implementation, in the above-mentioned GPU-based SLAM control method according to the embodiment of the present invention, as shown in fig. 2, after performing step S105 to skip the last twenty to thirty iterative operations during program compiling, the method may further include the following steps:
 s106, after the current image frame information is fused into the TSDF map, only partial threads are started when the surface information is extracted from the TSDF map, the started partial threads are normally executed, and the result is copied into the shared memory after the execution is finished. It is emphasized that this step may be referred to as a thread replacement method (thread approximation, TA for short).
It should be noted that, after the current image information has been fused into the TSDF map, the SLAM needs to extract new surface information from the updated TSDF map, and the SLAM adopts raycast as a method for extracting the surface information, which is based on the basic principle: and transmitting a plurality of rays from the camera to the TSDF map along the current camera pose direction, and searching along the transmitting direction of each ray to find out the surface information. The voxel in the TSDF map stores data representing the surface information, and a range of-1 to 1,0 represents the surface information stored. If 0 is stored in the voxel encountered by the ray, the search is stopped and the surface information is saved. When the GPU executes the process, each ray is responsible for one thread, and because of the large quantity of rays, the surface information found between adjacent rays is similar, namely the execution results of the adjacent threads are similar. The principle of TA method is that only a part of threads are started and normally executed, after the execution is finished, the result is copied into the shared memory, and the threads adjacent to the threads take the corresponding execution result from the memory.
Specifically, in the specific implementation, before performing steps S103, S105, S106, it may further include: the approximate calculation weights of RBE method, LS method and TA method are adjusted to make SLAM algorithm have optimum speed-up ratio.
It should be noted that, since the approximation calculation may cause an error, when the method is applied, it is impossible to perform the maximum approximation processing on each method, which may cause the result to exceed the tolerance range, and the better acceleration performance may not be achieved by only one method, and the acceleration degree of SLAM may be different by different methods. An optimization scheme is therefore needed to rationally distribute the approximate calculated specific gravity of the four methods so that SLAM achieves the maximum overall speed ratio. The CDI method recognizes key data so as to reduce the overall noise of the image and exclude areas with larger light intensity or areas with undetectable depth values in the depth image, so that the CDI method only generates a small error compared with the original algorithm, and the specific gravity of the CDI method is not required to be adjusted, that is, the CDI method is always applied, and the error caused by the CDI method is very small and does not influence the result. The approximate computation intensity for RBE is the number of adjacent warp, for example, the approximate intensity for RBE when passing one warp direction to two adjacent warp is 2. For LS, the approximate computation strength is defined as the number of iterations skipped by the loop, e.g., the last thirty iterations of the for loop are skipped, then the approximate computation strength is 30. For TA, the approximate computation strength is defined as the number of adjacent threads that a normal executing thread is to replace.
Preferably, the invention adopts an iterative method to search the optimal speed ratio, the approximate calculation degree is expressed as D, and the corresponding approximate calculation degree of RBE, LS, TA is defined as Dr, dl and Dt, and the initial values of the Dr, dl and Dt are all set to be 1. According to the specific application scene of the SLAM algorithm, the original SLAM algorithm is operated once to know how much the basic error is; the approximate calculation space is the maximum allowable error minus the base error. Then, SLAM algorithm of RBE, LS, TA is independently operated to obtain acceleration curve and error curve of each method under application of different approximate calculation degree. The acceleration ratio value of each method is divided by the error value produced to obtain an acceleration ratio value resulting from the increase in unit error, which is indicated by the letter X. An iterative approach can now be used to search for the optimal speed ratio. Setting iteration step length: RBE iteration step size is set to 10, LS iteration step size is set to 3, TA iteration step size is set to 1. Firstly, selecting the method with the largest X value in the three methods for adjustment, and the other two methods are not moved, wherein the D of the method is added with an iteration step length, the error value generated by the approximation degree of the method at the moment is correspondingly subtracted from the design space of the error, the X values of the three methods at the moment are continuously compared, the iteration is continued until the design space of the error is 0, the approximation degree corresponding to each of the three methods, namely the D value, is the optimal solution, and for each frame, the optimization scheme can be adopted to realize the precise control of the SLAM algorithm, so that the error is controllable, and meanwhile, the maximum algorithm acceleration is realized.
KinectFusion is a classical visual SLAM algorithm that focuses on building dense maps of scenes. The following applies the SLAM control method provided by the present invention to KinectFusion, and applies KinectFusion after the similar processing to an indoor scene to specifically describe:
 the GPU code of KinectFusion algorithm may be divided into tens of kernel functions, with different kernel functions performing different functions. As shown in fig. 3 and 4, the two kernel functions INTEGRATE and raycast are characterized by applying an approximate calculation method. INTEGRATE, fusing each frame of image information acquired by the camera into a TSDF map so as to complete the work related to image construction, and Raycast extracting surface information from the TSDF map constructed in INTEGRATE to acquire the pose of the camera. For INTEGRATE, three algorithms, CDI, RBE, LS each, were used.
Firstly, the CDI is used for identifying the picture shot by the camera, and then only the pixel blocks where the identified object is located are reserved for other discarding, namely, only the identified area is reconstructed in the subsequent map reconstruction.
After that, the code goes into four consecutive if-else statements in INTEGRATE, and the invention processes this statement by RBE method, since too many if-else will cause branch divergence phenomenon. The RBE specific deployment flow is as follows: a 256-bit bool array is created in the GPU's shared memory for each thread of a representative warp (32 threads make up a warp) to accommodate the branching information of the 256 voxels executed by the thread. The present invention chooses shared memory to implement a BDS because it can be accessed quickly by all threads and it is much faster than global memory. Representative warp is selected and allowed to function properly. When it is completed, the data content of the representative thread in each warp is similar to: 000010011111000 …,1 indicates that this voxel is not executed until the fourth branch instruction, is the voxel to be found, and will take part in the next operation, while the reference number 0 indicates that there is no need to continue to judge the next useless voxel, and the result obtained by representative warp will be directly forwarded to the next rest warp, which will eliminate branch divergence phenomenon of the rest warp.
Then, the LS method is applied to a main loop in INTEGRATE, the original iteration number of the loop is 256, LS processing is adopted, the approximate calculation degree of LS is Dl, and therefore the iteration number of the main loop after LS processing is 256-Dl.
Finally, applying the TA method to raycast and raycast allocates a thread for each pixel point of the image, each thread represents a light ray, and starting the thread according to the mark of the image pixel by the TA method, for example, for threads executing the pixel with the horizontal axis coordinate of 0, the threads are normally started, the result is directly written into a memory storing the result of the thread with the horizontal axis coordinate in the range of 1-Dt after the result is obtained, and the thread representing the pixel with the horizontal axis coordinate in the range of 1-Dt does not need to be started. Raycast and the integration are the same image, so that for raycast, the result of CDI in the integration can be directly applied to process only the image area identified by CDI.
In summary, for KinectFusion, the CDI, RBE, LS, TA four methods are applied to two most time-consuming kernel functions (INTEGRATE, RAYCAST), and compared with the prior art, the method achieves the goal of speed improvement by controlling some initial parameters of the SLAM algorithm, and can greatly improve the operation speed of the SLAM by starting from the calculation bottleneck of the SLAM algorithm.
Correspondingly, the embodiment of the invention also discloses electronic equipment, which comprises a processor and a memory; the processor executes the computer program stored in the memory to implement the GPU-based SLAM control method disclosed in the foregoing embodiment.
For more specific procedures of the above method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.
Further, the invention also discloses a computer readable storage medium for storing a computer program; the computer program, when executed by a processor, implements the previously disclosed GPU-based SLAM control method.
For more specific procedures of the above method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. The apparatus and the storage medium disclosed in the embodiments are relatively simple to describe, and the relevant points refer to the description of the method section since they correspond to the methods disclosed in the embodiments.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The SLAM control method based on the GPU provided by the embodiment of the invention comprises the following steps: carrying out error prediction on each frame of input image information to obtain a predicted value; judging whether the predicted value is smaller than a set threshold value or not; if yes, in the process of running the SLAM algorithm, only a plurality of representative warp are started, the branch directions of all threads in the representative warp are recorded, and the recorded branch directions are transmitted to the adjacent warp. The invention starts from the SLAM algorithm itself or the characteristic of running on the GPU, applies the approximate calculation technology of redundant branch elimination method to the kernel function of the SLAM algorithm, achieves the aim of speed improvement by controlling some initial parameters of the SLAM algorithm, fundamentally solves branch divergence phenomena, further solves the operation bottleneck of the SLAM algorithm itself, reduces redundant calculation, improves the running speed of the SLAM, realizes the refined control of SLAM error to keep the SLAM error below an acceptable maximum error value all the time, and explores the prospect and opportunity of applying the approximate calculation technology to the SLAM high-precision application. In addition, the invention also provides corresponding equipment and a computer readable storage medium for the SLAM control method based on the GPU, so that the method has more practicability, and the equipment and the computer readable storage medium have corresponding advantages.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The GPU-based SLAM control method, device and storage medium provided by the present invention are described in detail above, and specific examples are applied to illustrate the principles and embodiments of the present invention, and the description of the above examples is only used to help understand the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.