Movatterモバイル変換


[0]ホーム

URL:


CN119359533B - Viewing angle conversion method, device, electronic equipment and storage medium - Google Patents

Viewing angle conversion method, device, electronic equipment and storage medium
Download PDF

Info

Publication number
CN119359533B
CN119359533BCN202411921734.3ACN202411921734ACN119359533BCN 119359533 BCN119359533 BCN 119359533BCN 202411921734 ACN202411921734 ACN 202411921734ACN 119359533 BCN119359533 BCN 119359533B
Authority
CN
China
Prior art keywords
bev
images
image
space
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411921734.3A
Other languages
Chinese (zh)
Other versions
CN119359533A (en
Inventor
沈博
崔浩
郑祥祥
郭涛
胡金水
魏思
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co LtdfiledCriticaliFlytek Co Ltd
Priority to CN202411921734.3ApriorityCriticalpatent/CN119359533B/en
Publication of CN119359533ApublicationCriticalpatent/CN119359533A/en
Application grantedgrantedCritical
Publication of CN119359533BpublicationCriticalpatent/CN119359533B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention relates to the field of computer vision and provides a visual angle conversion method, a visual angle conversion device, electronic equipment and a storage medium, wherein the method comprises the steps of acquiring a first number of images under a plurality of visual angles, and determining mutually independent image groups in spatial distribution from the images; and simultaneously converting the image group into the same bird's eye view angle BEV space to obtain a second number Zhang Chushi of BEV images, and polymerizing the initial BEV images to obtain BEV images, wherein the second number is smaller than the first number. According to the visual angle conversion method, the visual angle conversion device, the electronic equipment and the storage medium, the image groups which are independent in space distribution are determined from the images under the multiple visual angles, and the image groups are simultaneously converted into the BEV space of the same aerial view visual angle, so that the calculation redundancy is greatly reduced, the calculation amount is reduced, the overall time consumption of visual conversion tasks is reduced, and the conversion efficiency is improved.

Description

Viewing angle conversion method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer vision, and in particular, to a viewing angle conversion method, apparatus, electronic device, and storage medium.
Background
Bird's Eye View (BEV) space plays a vital role in the areas of autopilot and computer vision. It provides a top down view that unifies data from different sensors into a common coordinate system, thereby simplifying the understanding and handling of the environment. More and more visual perception techniques employ BEV schemes to enable 3D perception modeling of the surrounding environment.
Conversion of the look-around multi-camera input into BEV space is a key step in implementing BEV schemes. In general, this process needs to process a large amount of image data, which generates a large amount of time consumption, so a method is needed to reduce the overall time consumption of the vision conversion task, so as to help accelerate the evolution and landing of tasks such as automatic driving.
Disclosure of Invention
The invention provides a viewing angle conversion method, a viewing angle conversion device, electronic equipment and a storage medium, which are used for solving the defect that a lot of time is consumed in converting a multi-camera input into a BEV space in the prior art.
The invention provides a visual angle conversion method, which comprises the following steps:
Acquiring a first number of images under a plurality of view angles, and determining mutually independent image groups in spatial distribution from the images;
And simultaneously converting the image group into the same bird's eye view angle BEV space to obtain a second number Zhang Chushi of BEV images, and polymerizing the initial BEV images to obtain BEV images, wherein the second number is smaller than the first number.
According to the viewing angle conversion method provided by the invention, the method for determining the image groups independent in spatial distribution from the images comprises the following steps:
Based on the shooting view angle of the image, mutually independent image groups in spatial distribution are determined from the image.
According to the viewing angle conversion method provided by the invention, the image group is simultaneously converted into the same BEV space, so as to obtain the initial BEV image, which comprises the following steps:
extracting image characteristics of each image in the image group;
Determining a mapping relation from an image coordinate system to a BEV coordinate system based on internal and external parameters of the camera;
And simultaneously mapping the image features to the same BEV space based on the mapping relation to obtain the initial BEV image.
According to the viewing angle conversion method provided by the invention, the image group is simultaneously converted into the same bird's eye view angle BEV space to obtain a second number Zhang Chushi of BEV images, and the initial BEV images are aggregated to obtain BEV images, which comprises the following steps:
Fusing operators in the view angle conversion process to obtain view angle conversion operators, wherein the view angle conversion operators are used for simultaneously converting the image groups into the same BEV space to obtain a second number Zhang Chushi of BEV images, and aggregating the initial BEV images to obtain BEV images;
Based on the perspective conversion operator, reading images at the plurality of perspectives, performing perspective conversion and saving the BEV image.
According to the viewing angle conversion method provided by the invention, the fusion of operators in the viewing angle conversion process is carried out to obtain the viewing angle conversion operator, and the method comprises the following steps:
Fusing a sampling operator, a multiplication operator, a summation operator and a division operator in each operator to obtain the view angle conversion operator;
The sampling operator and the multiplication operator are used for converting the image group into the same BEV space to obtain an initial BEV image, and the summation operator and the division operator are used for aggregating the initial BEV image to obtain the BEV image.
According to the viewing angle conversion method provided by the invention, the image group is simultaneously converted into the same bird's eye view angle BEV space to obtain a second number Zhang Chushi of BEV images, and the initial BEV images are aggregated to obtain BEV images, which comprises the following steps:
based on a vectorization processor, the image group is simultaneously converted into the same bird's eye view angle BEV space through a vectorization instruction to obtain a second number Zhang Chushi of BEV images, and the initial BEV images are aggregated to obtain the BEV images.
According to the visual angle conversion method provided by the invention, the vectorization processor is deployed on the end-side equipment.
The present invention also provides a viewing angle conversion apparatus, comprising:
An image acquisition unit, configured to acquire a first number of images at a plurality of view angles, and determine image groups that are independent of each other in spatial distribution from the images;
and the view angle conversion unit is used for simultaneously converting the image group into the same bird's-eye view angle BEV space to obtain a second number Zhang Chushi of BEV images, and polymerizing the initial BEV images to obtain BEV images, wherein the second number is smaller than the first number.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing any of the viewing angle conversion methods described above when executing the computer program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a viewing angle conversion method as described in any one of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements a viewing angle conversion method as described in any one of the above.
The visual angle conversion method, the visual angle conversion device, the electronic equipment and the storage medium are characterized in that the first number of images under a plurality of visual angles are obtained, the mutually independent image groups in space distribution are determined from the images, the image groups are simultaneously converted into the same bird's eye view visual angle BEV space to obtain the second number Zhang Chushi of BEV images, the initial BEV images are aggregated to obtain the BEV images, and the second number is smaller than the first number. Compared with the prior art that the calculation redundancy caused by mapping to the BEV space is larger for each input image, the embodiment converts the image groups which are independent of each other in spatial distribution into the BEV space with the same aerial view angle at the same time, and the redundancy points in the obtained initial BEV image are greatly reduced, namely the calculation redundancy is greatly reduced.
Meanwhile, as the number of the initial BEV images is smaller than that of Yu Shanzhang images, the calculated amount of the step of aggregating the initial BEV images is correspondingly reduced, so that the whole time consumption of the visual transformation task is reduced, and the transformation efficiency is improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a viewing angle conversion method provided by the present invention.
Fig. 2 is a map of the image to BEV space in the related art.
FIG. 3 is a map of the image group to BEV space provided by the present invention.
Fig. 4 is a schematic diagram of coordinate transformation provided by the present invention.
Fig. 5 is a schematic structural diagram of a viewing angle conversion device provided by the present invention.
Fig. 6 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
With the development of visual perception technology, the traditional 2D visual perception method of single-camera input is gradually not suitable for the requirements of tasks such as automatic driving, robots and the like. More and more visual perception techniques employ BEV schemes to enable 3D perception modeling of the surrounding environment. Taking evolution of an automatic driving visual model as an example, models such as BEVDet, maptr and the like are visual perception schemes based on BEV space, and the schemes remarkably improve visual perception effects by utilizing data fusion capability provided by the BEV space. In these schemes, however, conversion of the look-around multi-camera input to the BEV space is a key step in implementing the BEV scheme. This process typically requires processing large amounts of image data, which can be time consuming, and thus, there is a significant need to accelerate this process.
In view of the above problems, an embodiment of the present invention provides a viewing angle conversion method, in which a first number of images under multiple viewing angles are acquired, and image groups independent from each other in spatial distribution are determined from the images, and the image groups are simultaneously converted into BEV space of the same bird's eye view viewing angle to obtain a second number Zhang Chushi of BEV images, and the initial BEV images are aggregated to obtain BEV images, where the second number is smaller than the first number. Compared with the prior art that the calculation redundancy caused by mapping to the BEV space is larger for each input image, the embodiment converts the image groups which are independent of each other in spatial distribution into the BEV space with the same aerial view angle at the same time, and the redundancy points in the obtained initial BEV image are greatly reduced, namely the calculation redundancy is greatly reduced.
Meanwhile, as the number of the initial BEV images is smaller than that of Yu Shanzhang images, the calculated amount of the step of aggregating the initial BEV images is correspondingly reduced, so that the whole time consumption of the visual transformation task is reduced, and the transformation efficiency is improved.
The embodiment of the invention can be applied to the scene needing visual conversion, in particular to the scene from which the input of the multi-camera is converted into the bird's eye view, such as the fields of automatic driving, robot navigation, monitoring systems and the like. The execution subject of the method can be electronic equipment such as terminal equipment, computers, servers, server clusters or specially designed scene switching equipment, or can be a visual conversion device arranged in the electronic equipment, and the visual conversion device can be realized by software, hardware or a combination of the two.
In describing embodiments of the present invention, it should be understood that the terms "first," "second," and "second" are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or number of features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features. In the description of the embodiments of the present invention, the meaning of "plurality" is two or more, unless explicitly defined otherwise.
Fig. 1 is a schematic flow chart of a viewing angle conversion method according to the present invention, as shown in fig. 1, the method includes the following steps:
step 110, acquiring a first number of images at a plurality of viewing angles, and determining image groups which are independent from each other in spatial distribution from the images.
Specifically, images for a plurality of viewing angles may be acquired by the image acquisition apparatus. A plurality of cameras may be arranged around the target area, which cameras may capture images from different angles and positions. For example, on an autopilot, cameras may be mounted on the front, rear, sides, and roof of the vehicle, respectively.
The cameras are started to capture images simultaneously or sequentially, so that each camera can capture environmental information in the visual field range, and the images form a first image set. Taking a six-ring camera as an example, the first number may be 6. Of course, the first number can be flexibly adjusted according to the application scene and the requirement, for example, the first number can be 4.
In addition, pre-processing, such as denoising, contrast enhancement, etc., can be performed on the captured image to improve the accuracy and efficiency of subsequent processing.
In the related art, typically, each image needs to be mapped to the BEV space, and, for example, six-view camera input, six initial BEV images are obtained through conversion. Each initial BEV image is populated with corresponding points taken from the camera image. However, this process is highly redundant because the image of each camera does not completely cover a BEV space.
Fig. 2 is a map of an image to BEV space in the related art, as shown in fig. 2, a is a map of a front view image to BEV space, b is a map of a rear view image to BEV space, c is a map of a left front view image to BEV space, d is a map of a left rear view image to BEV space, e is a map of a right front view image to BEV space, and f is a map of a right rear view image to BEV space. When using GRID SAMPLE operator to take features from the camera image, the uncovered areas are also operated on, resulting in about 50% redundant operation.
Furthermore, since the view angle of each camera is limited, each voxel feature is very sparse, e.g., only about 17% of the positions are non-zero positions. Because of their large scale, the amount of computation to aggregate these voxel features is very large, and therefore it is necessary to fuse these voxel features to produce a dense voxel feature to avoid time consuming voxel aggregation.
In this embodiment, after the images under multiple viewing angles are acquired, the image groups that are spatially independent from each other are determined from the acquired images, that is, any two images in the image groups are spatially independent from each other. Two images are considered to be spatially independent sets of images if they overlap to a low extent and each covers a different part of the target area. Images at least two viewing angles may be included in any one image group.
The determination of the image group may be achieved by an image processing algorithm. Feature points in each image are extracted using an image processing algorithm (e.g., SIFT, SURF, ORB, etc.), descriptors of the feature points are computed, and matching pairs of feature points are found between different images using the feature point descriptors. The transformation matrix (e.g., perspective transformation matrix) between the images is calculated by RANSAC algorithm or other robust estimation method. Feature points in the image are mapped into a world coordinate system according to the transformation matrix. The distribution of the feature points in the world coordinate system is analyzed to determine which image groups are spatially independent of each other. The independence can be judged by calculating the overlapping degree, the distance relation, and the like of the feature points between the image groups. And screening out mutually independent image groups on the spatial distribution according to the spatial distribution analysis result. Ensuring that the images in each image group can cover different spatial areas, avoiding repetition and redundancy.
In other embodiments, the group of images may also be determined by the viewing angle of the individual images. Images from different viewing angles typically cover different parts of the target area, while images from diametrically opposite viewing angles are typically spatially distributed independently of each other. For example, a front view image and a rear view image may be regarded as one image group, a left front view image and a right rear view image may be regarded as one image group, and a left rear view image and a right front view image may be regarded as one image group.
And step 120, converting the image group to the same bird's eye view angle BEV space simultaneously to obtain a second number Zhang Chushi of BEV images, and polymerizing the initial BEV images to obtain BEV images, wherein the second number is smaller than the first number.
Specifically, after obtaining the image groups independent from each other in spatial distribution, the image groups can be simultaneously converted into the same bird's eye view angle BEV space, so as to obtain a second number Zhang Chushi of BEV images. Because the images in the image group are spatially distributed independently of each other and can cover different spatial regions, the conversion is performed in the image group mode, and the redundancy points in the initial BEV image are greatly reduced, namely the calculation redundancy is greatly reduced.
Meanwhile, in the conversion process, each BEV space needs to be subjected to sampling operation, namely, the original first number of times of sampling operation is reduced to the second number of times of sampling operation, and the second number is obviously smaller than the first number, so that the calculated amount of the sampling operation is obviously reduced.
Thereupon, the initial BEV image is aggregated to obtain a BEV image. For aggregation, this can be achieved by summing and then averaging the feature values of the BEV spatially co-located. Since the number of initial BEV images is decreasing, the computational effort of the aggregation operation is correspondingly reduced.
According to the method provided by the embodiment of the invention, the image groups which are independent in space distribution are determined from the images under a plurality of view angles, and the image groups are simultaneously converted into the BEV space of the same aerial view angle, so that the calculation redundancy is greatly reduced, the calculation amount is reduced, the overall time consumption of a visual conversion task is reduced, and the conversion efficiency is improved.
Based on any of the above embodiments, determining the spatially independent image groups from the images in step 110 includes:
Based on the shooting view angle of the images, mutually independent image groups in spatial distribution are determined from the images.
In particular, images from different viewing angles typically cover different portions of the target area, while images from diametrically opposite viewing angles are typically spatially distributed independently of each other. For example, a front view image and a rear view image may be regarded as one image group, a left front view image and a right rear view image may be regarded as one image group, and a left rear view image and a right front view image may be regarded as one image group.
Taking six-view camera input as an example, by observing the mapping relation diagram of the image to the BEV space, all pixel points of the plurality of pairs of images are mapped to the BEV space and are not overlapped, namely are independent in space distribution. Thus, in mapping images to BEV space, a pair of non-overlapping images may be mapped simultaneously for one BEV image. Fig. 3 is a map of an image group to BEV space provided by the present invention, where a is a map of an image group consisting of a front view image and a rear view image to BEV space, b is a map of an image group consisting of a front right view image and a rear left view image to BEV space, and c is a map of an image group consisting of a front left view image and a rear right view image to BEV space, as shown in fig. 3.
As can be seen from a comparison of fig. 2 and 3, the redundancy points in the initial BEV image are substantially reduced, i.e. the computational redundancy is substantially reduced. The simultaneous conversion process is reduced from sampling each BEV space to sampling with only three BEV spaces, and the calculation amount of the step is reduced by 50%. The subsequent operation is to aggregate multiple BEV spaces, and because the number of BEV spaces is reduced by 50%, the calculation of the subsequent aggregation operation will also be reduced by 50%, so the overall calculation of the algorithm is reduced by 50%.
According to the method provided by the embodiment of the invention, the image groups which are independent in space distribution are determined from the images based on the shooting view angles of the images, so that the total calculated amount can be further reduced.
Based on any of the embodiments described above, converting the image groups simultaneously to the same BEV space in step 120 results in an initial BEV image, comprising:
step 121, extracting image features of each image in the image group;
step 122, determining the mapping relation from the image coordinate system to the BEV coordinate system based on the internal and external parameters of the camera;
And step 123, mapping the image features to the same BEV space at the same time based on the mapping relation to obtain an initial BEV image.
The method comprises the steps of obtaining an initial BEV image by converting an image group into the same BEV space at the same time, extracting features of each image in the image group through a convolutional neural network (Convolutional Neural Networks, CNN) model, and then mapping the features of all positions in the image group to corresponding positions in the BEV by using a mapping equation constructed by internal and external parameters of a camera to obtain the initial BEV image.
The conversion of the image coordinate system to the BEV coordinate system is primarily involved in step 122, requiring mapping of all pixels in the image to positions in BEV space. Fig. 4 is a schematic diagram of coordinate transformation provided by the present invention, and as shown in fig. 4, this process requires 1 transforming a pixel coordinate system into an image coordinate system by discretization means, 2 transforming the image coordinate system into a camera coordinate system by perspective relation, and 3 transforming the camera coordinate system into a world coordinate system by rigid transformation.
In order to perform the above procedure, it is necessary to acquire internal parameters of the camera (such as focal length, principal point position, etc.) to convert the image coordinate system into the camera coordinate system, which is usually obtained by camera calibration. The reference matrix K can be expressed as formula (1),AndIs the focal length of the lens and,AndIs the principal point coordinates.
(1)
It is also necessary to acquire camera external parameters for conversion of the camera coordinate system into the world coordinate system. External parameters describe the position and pose of the camera relative to the world coordinate system. The extrinsic parameters include a rotation matrix R and a translation vector t. For a point (u, v) in the image, it can be mapped to a point (X, Y, Z) in three-dimensional space by coordinate transformation. Using the internal reference matrix K and the external reference matrix [ r|t ], a projection matrix P can be constructed as in equation (2).
(2)
For each pixel point (u, v) in the image, assuming its corresponding depth is Z (typically determined by depth estimation or a known height), its three-dimensional coordinates (X, Y, Z) in the world coordinate system can be calculated by equation (3).
(3)
Three-dimensional coordinates (X, Y, Z) are projected onto the BEV plane. Since BEV is a top view, only the X and Y coordinates are typically of interest, ignoring the Z coordinate.
Assuming that the resolution of the BEV isAnd the spatial extent of the BEV has been defined (e.g., from-L/2 to L/2, from-W/2 to W/2). Then three-dimensional coordinates (X, Y) can be converted into the BEV coordinate system by equations (4) and (5),)。
(4)
(5)
To accelerate this process, it is generally assumed that the depth distribution of the pixels is uniform in the radial direction, in which case, after the internal and external reference matrices are acquired, the mapping from the image coordinate system to the BEV coordinate system is determined, and the mapping relationship can be solidified using a lookup table method, and when the conversion from the image coordinate system to the BEV coordinate system is required, the mapping relationship can be found from the lookup table, and according to this mapping relationship, the conversion can be completed by carrying the image features to the designated position in the BEV space, and this process is generally implemented by using a CPU or GPU through a GRID SAMPLE sampling operator.
Based on any of the above embodiments, the image group is simultaneously converted to the same bird's eye view BEV space, so as to obtain a second number Zhang Chushi of BEV images, and the initial BEV images are aggregated to obtain BEV images, that is, step 120 specifically includes:
step 124, fusing all operators in the view angle conversion process to obtain a view angle conversion operator, wherein the view angle conversion operator is used for simultaneously converting the image group into the same BEV space to obtain a second number Zhang Chushi of BEV images, and aggregating the initial BEV images to obtain BEV images;
Step 125, based on the view conversion operator, reading the images at multiple views, performing view conversion and saving the BEV image.
Specifically, considering that operators involved in the calculation process of the existing algorithm are all memory access intensive operators, the data is written back by performing simple calculation after corresponding data is acquired each time. The calculation mode is very unfriendly to large-scale data, and has the problems of poor data locality, insufficient data reuse, insufficient hardware utilization and the like. If the program has poor data locality, i.e., frequently accesses discontinuous memory addresses, a large number of cache misses may result, thereby increasing the latency of memory access. If the data in the program is not sufficiently reused, each calculation requires a re-read of the data from the memory, which can result in unnecessary memory accesses and can increase the pressure on memory bandwidth. Modern processors often have multiple levels of cache systems and SIMD instruction sets to improve data processing capabilities. If the memory ratio is too low, it may result in underutilization of these hardware resources, thereby affecting performance.
In the process of converting the view angles of the image features, multiple steps are needed, namely, firstly converting the points of all the images into BEV space, and then aggregating the points. The operators involved in these processes include sampling operators, multiplication operators, summation operators, and division operators.
The method comprises the steps of sampling operators, multiplying operators, summing operators and dividing operators, wherein the sampling operators relate to GRID SAMPLE operation to convert points into BEV space, the multiplying operators relate to multiplication operation to set all values obtained by redundant points to zero, original values are reserved by normally mapped points to obtain an initial BEV image, the summing operators relate to summation operation to sum characteristic values of the same position of the BEV space, and the dividing operators relate to division operation to average the characteristics to achieve aggregation of the initial BEV image to obtain the BEV image.
Each of the operations described above requires a data access operation to be performed before and after the operation, but each operation is very simple to calculate, which results in a large amount of redundant memory accesses. Meanwhile, the computing part can process the acquired data quickly, so that the computing part is in a state of waiting for the data most of the time, and the computing power utilization rate of hardware is very low.
The circuit cycle number consumed by the memory access is tens to hundreds times of that of the calculation operation, so that the time of memory access and calculation can be balanced only by carrying out calculation on each acquired data for a plurality of times, thereby forming a pipeline, enabling the memory access part and the calculation part to be in busy states, and improving the utilization rate of hardware. To achieve this objective, in this embodiment, all operators in the image feature perspective conversion process are fused, and after the fusion of all operators, a single perspective conversion operator is generated.
When the calculation process is executed, only the data needed by the view angle conversion operator is needed to be read, then conversion calculation is carried out, and the result is saved after all calculation is completed. The fused operator omits multiple access operations, reduces the most time-consuming part operation in the calculation process by 3/4, and reduces the calculation time consumption.
According to the method provided by the embodiment of the invention, the operators in the visual angle conversion process are fused to obtain the visual angle conversion operator, so that multiple access operations are omitted for the fused operators, and the calculation time consumption is reduced.
Based on any of the above embodiments, the image group is simultaneously converted to the same bird's eye view BEV space, so as to obtain a second number Zhang Chushi of BEV images, and the initial BEV images are aggregated to obtain BEV images, that is, step 120 specifically includes:
step 126, based on the vectorization processor, converting the image group to the same bird's eye view angle BEV space through the vectorization instruction, obtaining a second number Zhang Chushi of BEV images, and aggregating the initial BEV images to obtain the BEV images.
In particular, the basic principle of parallel computing acceleration is to reduce the overall computing time by decomposing one large task into multiple small tasks and letting these small tasks execute simultaneously on different processing units.
The means for accelerating parallel computation mainly comprises two layers of hardware and software technologies, different methods and principles of acceleration related to different hardware are used, and different schemes such as a multi-core processor, a graphics processing unit (Graphic Processing Unit, GPU), a field programmable gate array (Field Programmable GATE ARRAY, FPGA), a vector processor and the like can exist in the hardware for accelerating parallel computation.
The multi-core processor utilizes a plurality of processing cores in a modern CPU, each core can independently execute tasks so as to realize parallel processing, the GPU is provided with a large number of parallel processing units (CUDA cores or stream processors) which are particularly suitable for large-scale data parallel processing, the FPGA is a semi-custom circuit which can be programmed according to specific application requirements so as to realize the high optimization and parallelization of a specific algorithm, and the vector processor can execute operations on data vectors in a single instruction period. The software layer has means such as parallel algorithm design, and different parallel algorithms often have difference in calculated amount, so that time consumption is affected. Meanwhile, the parallel algorithm needs to adapt to different hardware, so that the maximum computing power of the hardware is exerted, and the theoretical time consumption of the algorithm is approached.
In the end-side environment, a plurality of problems exist in using a multi-core processor, a GPU and an FPGA, and the practical application of the parallel method on the hardware is affected. The multi-core processor has the problem of small acceleration, the acceleration ratio of parallel computation depends on the number of the multi-core processors, and the number of the multi-core processors is small in an end side environment, so that remarkable acceleration is difficult to obtain. The GPU has the problem of higher power consumption and cost, and the terminal side environment often has electric quantity limitation, so that the application of the GPU in the terminal side environment is restricted to a great extent. The FPGA has the problems of large development complexity, long development period and high cost. The existing algorithm is not optimized for the terminal side equipment, and is still realized based on hardware characteristics such as GPU and CPU. This will result in much more time consuming reasoning than at the server side when the algorithm is deployed to the end-side device.
In this implementation, the vectorization processor is used to parallelize and accelerate the algorithm, which allows the processor to perform the same operation on multiple data items at the same time, so that the time consumed by running the algorithm is greatly reduced.
The deployment of the vectorized processor at the end side (namely the equipment end such as a smart phone, an Internet of things device and the like) has a plurality of advantages, in particular has the advantages of high performance, high energy efficiency ratio and the like. The vectorizing processor uses vector instructions for computation. Vectorized instructions use special vector registers to store data, which can hold multiple data elements and support performing operations on all elements at once. Taking addition as an example, the vectorization instruction can be expressed as a form of formula (6), and a plurality of elements in the vector a and the vector B are added respectively and stored in corresponding positions in the vector C. This process corresponds to the execution of n instructions simultaneously.
C=A+B(6)
In the process of converting the image characteristic visual angle, GRID SAMPLE sampling operation, multiplication operation and division operation are all independently calculated aiming at a single element, and summation operation is an aggregate calculation process, and the operations are very suitable for accelerating vectorization instructions. Therefore, the operations can be respectively vectorized and accelerated, and then operation fusion is carried out, so that vectorization and acceleration of the whole calculation process are completed, and the time consumption of algorithm operation is reduced by several times.
Based on any one of the foregoing embodiments, there is provided a viewing angle conversion method, including:
S1, acquiring a first number of images under a plurality of view angles, and determining mutually independent image groups in spatial distribution from the images based on the shooting view angles of the images.
S2, converting the image group into the same bird' S eye view angle BEV space simultaneously to obtain a second number Zhang Chushi of BEV images, and polymerizing the initial BEV images to obtain BEV images, wherein the second number is smaller than the first number, so that the calculated amount of an algorithm is reduced. The method specifically comprises the following steps:
The method comprises the steps of extracting image features of images in an image group, determining a mapping relation from an image coordinate system to a BEV coordinate system based on internal and external parameters of a camera, and simultaneously mapping the image features to the same BEV space based on the mapping relation to obtain an initial BEV image.
S3, fusing all operators in the view angle conversion process to obtain a view angle conversion operator, reading images under a plurality of view angles based on the view angle conversion operator, executing view angle conversion and storing BEV images, so that redundant data access operation is eliminated.
S4, based on a vectorization processor, converting the image group into the same bird' S eye view angle BEV space through vectorization instructions to obtain a second number Zhang Chushi of BEV images, and aggregating the initial BEV images to obtain BEV images, wherein the vectorization processor is deployed on the end-side equipment, so that the time consumption of algorithm operation is reduced.
According to the method provided by the embodiment of the invention, firstly, based on the analysis of the calculation process of converting the input of the look-around multi-camera into the visual angle of the aerial view, a more efficient parallel calculation algorithm is designed, and the optimization of the calculated amount is realized. And secondly, a plurality of operators in the calculation process are fused, the data reuse rate in the calculation process is improved by utilizing the data locality principle, the calculation memory access ratio is improved at the hardware level, and the time consumption caused by memory access is reduced. Finally, the vector processing unit of the vector processor is used for parallelizing the calculation process, so that a plurality of subtasks of the algorithm can be executed simultaneously, and the operation time consumption of the algorithm is reduced. The scheme realizes the improvement of the algorithm reasoning efficiency on the end-side chip, reduces the reasoning time consumption of the whole scheme, and accelerates the evolution and landing of the algorithm.
The viewing angle conversion device provided by the present invention will be described below, and the viewing angle conversion device described below and the viewing angle conversion method described above may be referred to correspondingly to each other.
Based on any of the above embodiments, fig. 5 is a schematic structural diagram of a viewing angle conversion device according to the present invention, and as shown in fig. 5, the viewing angle conversion device includes:
An image acquisition unit 510, configured to acquire a first number of images at a plurality of view angles, and determine, from the images, image groups that are spatially distributed and independent from each other;
The view angle converting unit 520 is configured to convert the image group into the same bird's-eye view angle BEV space simultaneously, obtain a second number Zhang Chushi of BEV images, and aggregate the initial BEV images to obtain BEV images, where the second number is smaller than the first number.
According to the device provided by the embodiment of the invention, the image groups which are independent in space distribution are determined from the images under a plurality of view angles, and the image groups are simultaneously converted into the BEV space of the same aerial view angle, so that the calculation redundancy is greatly reduced, the calculation amount is reduced, the overall time consumption of a visual conversion task is reduced, and the conversion efficiency is improved.
Based on any of the above embodiments, the image acquisition unit is specifically configured to:
Based on the shooting view angle of the image, mutually independent image groups in spatial distribution are determined from the image.
Based on any of the above embodiments, the viewing angle conversion unit is specifically configured to:
extracting image characteristics of each image in the image group;
Determining a mapping relation from an image coordinate system to a BEV coordinate system based on internal and external parameters of the camera;
And simultaneously mapping the image features to the same BEV space based on the mapping relation to obtain the initial BEV image.
Based on any of the above embodiments, the viewing angle conversion unit is specifically configured to:
Fusing operators in the view angle conversion process to obtain view angle conversion operators, wherein the view angle conversion operators are used for simultaneously converting the image groups into the same BEV space to obtain a second number Zhang Chushi of BEV images, and aggregating the initial BEV images to obtain BEV images;
Based on the perspective conversion operator, reading images at the plurality of perspectives, performing perspective conversion and saving the BEV image.
Based on any of the above embodiments, the viewing angle conversion unit is specifically configured to:
Fusing a sampling operator, a multiplication operator, a summation operator and a division operator in each operator to obtain the view angle conversion operator;
The sampling operator and the multiplication operator are used for converting the image group into the same BEV space to obtain an initial BEV image, and the summation operator and the division operator are used for aggregating the initial BEV image to obtain the BEV image.
Based on any of the above embodiments, the viewing angle conversion unit is specifically configured to:
based on a vectorization processor, the image group is simultaneously converted into the same bird's eye view angle BEV space through a vectorization instruction to obtain a second number Zhang Chushi of BEV images, and the initial BEV images are aggregated to obtain the BEV images.
Based on any of the above embodiments, the vectoring processor is deployed on an end-side device.
Fig. 6 illustrates a physical schematic diagram of an electronic device, which may include a processor 610, a communication interface Communications Interface, a memory 630, and a communication bus 640, as shown in fig. 6, where the processor 610, the communication interface 620, and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a perspective transformation method that includes acquiring a first number of images at a plurality of perspectives and determining spatially independent image groups from the images, simultaneously transforming the image groups to a same bird's-eye view BEV space to obtain a second number Zhang Chushi of BEV images and aggregating the initial BEV images to obtain BEV images, the second number being less than the first number.
Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the perspective conversion method provided by the methods described above, the method comprising obtaining a first number of images at a plurality of perspectives, and determining spatially distributed groups of images from the images, simultaneously converting the groups of images to a same bird's eye view BEV space, obtaining a second number Zhang Chushi of BEV images, and aggregating the initial BEV images, the second number being smaller than the first number.
In yet another aspect, the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for perspective conversion provided by the above methods, the method comprising acquiring a first number of images at a plurality of perspectives and determining spatially distributed mutually independent image groups from the images, simultaneously converting the image groups to a same bird's eye view BEV space to obtain a second number Zhang Chushi of BEV images, and aggregating the initial BEV images to obtain BEV images, the second number being smaller than the first number.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present invention.

Claims (10)

CN202411921734.3A2024-12-252024-12-25Viewing angle conversion method, device, electronic equipment and storage mediumActiveCN119359533B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202411921734.3ACN119359533B (en)2024-12-252024-12-25Viewing angle conversion method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202411921734.3ACN119359533B (en)2024-12-252024-12-25Viewing angle conversion method, device, electronic equipment and storage medium

Publications (2)

Publication NumberPublication Date
CN119359533A CN119359533A (en)2025-01-24
CN119359533Btrue CN119359533B (en)2025-06-03

Family

ID=94303076

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202411921734.3AActiveCN119359533B (en)2024-12-252024-12-25Viewing angle conversion method, device, electronic equipment and storage medium

Country Status (1)

CountryLink
CN (1)CN119359533B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117576660A (en)*2023-11-152024-02-20中山大学·深圳 A low-latency multi-vehicle bird's-eye view perception method and device based on state estimation
CN118864229A (en)*2024-06-242024-10-29中汽创智科技有限公司 Vehicle bird's-eye view generation method, device, electronic device and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP5971223B2 (en)*2013-10-162016-08-17株式会社デンソー Composite image generator
CN112101066B (en)*2019-06-172024-03-08商汤集团有限公司Target detection method and device, intelligent driving method and device and storage medium
CN113408454B (en)*2021-06-292024-02-06上海高德威智能交通系统有限公司Traffic target detection method, device, electronic equipment and detection system
CN117197433A (en)*2023-09-072023-12-08科大讯飞股份有限公司Target detection method, device, electronic equipment and storage medium
CN117557974A (en)*2023-09-142024-02-13武汉极目智能技术有限公司Environment perception method, device, equipment and medium for multi-view BEV (back-to-back) view angles

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117576660A (en)*2023-11-152024-02-20中山大学·深圳 A low-latency multi-vehicle bird's-eye view perception method and device based on state estimation
CN118864229A (en)*2024-06-242024-10-29中汽创智科技有限公司 Vehicle bird's-eye view generation method, device, electronic device and storage medium

Also Published As

Publication numberPublication date
CN119359533A (en)2025-01-24

Similar Documents

PublicationPublication DateTitle
US20240046557A1 (en)Method, device, and non-transitory computer-readable storage medium for reconstructing a three-dimensional model
CN112862874B (en)Point cloud data matching method and device, electronic equipment and computer storage medium
CN116486038B (en)Three-dimensional construction network training method, three-dimensional model generation method and device
CN108428255A (en)A kind of real-time three-dimensional method for reconstructing based on unmanned plane
CN113012226B (en)Method and device for estimating pose of camera, electronic equipment and computer storage medium
CN111612898B (en)Image processing method, image processing device, storage medium and electronic equipment
CN113361365B (en)Positioning method, positioning device, positioning equipment and storage medium
CN113327318B (en)Image display method, image display device, electronic equipment and computer readable medium
CN108230235A (en)A kind of disparity map generation system, method and storage medium
CN115409949B (en) Model training method, perspective image generation method, device, equipment and medium
CN110706288A (en)Target detection method, device, equipment and readable storage medium
CN119359533B (en)Viewing angle conversion method, device, electronic equipment and storage medium
WO2025097814A1 (en)New viewpoint image synthesis method and system, electronic device, and storage medium
CN114862658B (en)Image processing system, method, intelligent terminal and computer readable storage medium
CN114913213B (en)Method and device for learning aerial view characteristics
CN112883976B (en)Semantic segmentation method, device and system based on point cloud and storage medium
Zhou et al.A large-batch orthorectification generation method based on adaptive GPU thread parameters and parallel calculation
CN114708219B (en) Stereo matching method, device and storage medium based on slope cost aggregation
CN118229938B (en)Color-imparting method, device, apparatus, medium and program product for point cloud model
Baker et al.GPU assisted processing of point cloud data sets for ground segmentation in autonomous vehicles
CN117911603B (en)Partition NeRF three-dimensional reconstruction method, system and storage medium suitable for large-scale scene
CN112967398B (en)Three-dimensional data reconstruction method and device and electronic equipment
CN118155063B (en) Multi-view three-dimensional target detection method, device, equipment and storage medium
CN120252704B (en)Pose information determining method and device of unmanned aerial vehicle, electronic equipment and storage medium
Li et al.The Architectural Implications of Multi-modal Detection Models for Autonomous Driving Systems

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp