CN116824548A

Movatterモバイル変換

Info

Publication number: CN116824548A
Application number: CN202310970823.6A
Authority: CN
Inventors: 徐荣荣; 段维维; 于尧; 张雨生; 吴腾; 贾少清; 李向利
Original assignee: Chery New Energy Automobile Co Ltd
Current assignee: Chery New Energy Automobile Co Ltd
Priority date: 2023-08-01
Filing date: 2023-08-01
Publication date: 2023-09-29

Abstract

The application discloses a method, a device and equipment for determining an obstacle and a readable storage medium, and belongs to the technical field of computers. The method comprises the following steps: acquiring a first frame image and a second frame image which describe the environment of a target vehicle; the method comprises the steps of extracting spatial features of a first frame image and a second frame image through a self-encoder to obtain the spatial features of the first frame image and the spatial features of the second frame image; extracting the time characteristics of the space characteristics of the first frame image and the space characteristics of the second frame image through a long-short-time memory model to obtain the time characteristics of the first frame image and the time characteristics of the second frame image; determining an image error between the first frame image and the second frame image based on the temporal characteristics of the first frame image and the temporal characteristics of the second frame image; if the image error is greater than the error threshold, it is determined that an obstacle is present. The method realizes the global comparison of different images of the environment where the vehicle is located from the images, and when the difference of the image contents is large, the existence of the obstacle in the environment is determined.

Description

Obstacle determination method, device, equipment and readable storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method, a device, equipment and a readable storage medium for determining an obstacle.

Background

With the development of computer technology, the variety and number of vehicles are increasing, so that more and more people have the ability to drive the vehicles. In the process of driving the vehicle by the driving object, traffic accidents are more and more frequent due to the fact that blind areas of vision exist or the attention of the driving object is not concentrated, and huge life and property threats are brought to people. Based on this, how to determine whether there is an obstacle in the environment where the vehicle is located to reduce the occurrence frequency of traffic accidents becomes a problem to be solved.

Disclosure of Invention

The application provides an obstacle determining method, an obstacle determining device, obstacle determining equipment and a readable storage medium, which can be used for solving the problems in the related art.

In one aspect, there is provided a method of determining an obstacle, the method comprising:

acquiring a first frame image and a second frame image, wherein the first frame image and the second frame image are different images for describing the environment of a target vehicle;

extracting spatial features of the first frame image through a self-encoder to obtain the spatial features of the first frame image, wherein the spatial features of the first frame image are used for describing each first image block in the first frame image;

Extracting the time characteristics of the space characteristics of the first frame image through a long-short-time memory model to obtain the time characteristics of the first frame image, wherein the time characteristics of the first frame image are used for describing the association relation among the first image blocks;

extracting spatial features of the second frame image through the self-encoder to obtain the spatial features of the second frame image, wherein the spatial features of the second frame image are used for describing each second image block in the second frame image;

extracting the time characteristics of the space characteristics of the second frame image through the long-short-time memory model to obtain the time characteristics of the second frame image, wherein the time characteristics of the second frame image are used for describing the association relation among the second image blocks;

determining an image error between the first frame image and the second frame image based on the temporal characteristics of the first frame image and the temporal characteristics of the second frame image;

and under the condition that the image error is larger than an error threshold value, determining that an obstacle exists in the environment where the target vehicle is located.

In another aspect, there is provided an obstacle determining device, the device including:

The acquisition module is used for acquiring a first frame image and a second frame image, wherein the first frame image and the second frame image are different images for describing the environment of the target vehicle;

the spatial feature extraction module is used for extracting spatial features of the first frame image through the self-encoder to obtain the spatial features of the first frame image, wherein the spatial features of the first frame image are used for describing each first image block in the first frame image;

the time feature extraction module is used for extracting the time features of the space features of the first frame images through the long-short-time memory model to obtain the time features of the first frame images, wherein the time features of the first frame images are used for describing the association relation among the first image blocks;

the spatial feature extraction module is further configured to perform spatial feature extraction on the second frame image through the self-encoder, so as to obtain spatial features of the second frame image, where the spatial features of the second frame image are used for describing each second image block in the second frame image;

the time feature extraction module is further configured to perform time feature extraction on the spatial features of the second frame image through the long-short-time memory model, so as to obtain time features of the second frame image, where the time features of the second frame image are used for describing association relationships between the second image blocks;

A determining module configured to determine an image error between the first frame image and the second frame image based on the temporal feature of the first frame image and the temporal feature of the second frame image;

the determining module is further configured to determine that an obstacle exists in an environment where the target vehicle is located when the image error is greater than an error threshold.

In another aspect, there is provided an electronic device including a processor and a memory, the memory storing at least one computer program loaded and executed by the processor to cause the electronic device to implement any one of the above-described obstacle determining methods.

In another aspect, there is provided a computer readable storage medium having stored therein at least one computer program loaded and executed by a processor to cause an electronic device to implement any one of the above-described obstacle determining methods.

In another aspect, there is also provided a computer program, the computer program being at least one, the at least one computer program being loaded and executed by a processor to cause an electronic device to implement any one of the obstacle determining methods described above.

In another aspect, there is also provided a computer program product having at least one computer program stored therein, the at least one computer program being loaded and executed by a processor to cause an electronic device to implement any of the above-described obstacle determining methods.

The technical scheme provided by the application has at least the following beneficial effects:

according to the technical scheme provided by the application, the spatial feature extraction is carried out on the first frame image and the second frame image, so that the image is divided into each image block on the spatial level, and each image block is described through the spatial feature. And then, extracting the spatial features of the first frame image and the spatial features of the second frame image, so that the association of each image block is realized, and the association relation among each image block is described through the temporal features, so that the temporal features can describe the image content from the image global, and the image has stronger characterization capability. And then, determining the image error between the first frame image and the second frame image based on the time characteristic of the first frame image and the time characteristic of the second frame image, and determining that the environment where the target vehicle is located has an obstacle under the condition that the image error is larger than an error threshold value, so that different images of the environment where the target vehicle is located are compared globally from the image, and when the difference of the image contents is larger, determining that the environment has the obstacle, thereby improving the determination efficiency and accuracy of the obstacle.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of an implementation environment of an obstacle determining method according to an embodiment of the present application;

fig. 2 is a flowchart of a method for determining an obstacle according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a self-encoder according to an embodiment of the present application;

FIG. 4 is a flow chart of detecting an obstacle according to an embodiment of the present application;

fig. 5 is a schematic structural view of an obstacle determining device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a vehicle-mounted terminal according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

Fig. 1 is a schematic view of an implementation environment of an obstacle determining method according to an embodiment of the present application, where the implementation environment includes a vehicle-mounted terminal 11 as shown in fig. 1. The obstacle determining method in the embodiment of the present application may be performed by the in-vehicle terminal 11. The in-vehicle terminal 11 may be a smart phone, a desktop computer, a tablet computer, a laptop portable computer, an intelligent in-vehicle device, an intelligent voice interaction device, or the like. The number of the in-vehicle terminals 11 is not limited, and may be one or more.

Alternatively, the implementation environment may further include a server, and the obstacle determining method in the embodiment of the present application may be executed by the server, or may be executed by the in-vehicle terminal 11 and the server together. The server may be a server, or a server cluster formed by a plurality of servers, or any one of a cloud computing platform and a virtualization center, which is not limited in the embodiment of the present application. The server may be communicatively connected to the in-vehicle terminal 11 via a wired network or a wireless network. The server may have functions of data processing, data storage, data transceiving, etc., and is not limited in the embodiment of the present application. The number of servers is not limited and may be one or more.

With the development of computer technology, the variety and number of vehicles are increasing, so that more and more people have the ability to drive the vehicles. When the obstacle exists in the environment where the vehicle is located, the situation that the vehicle collides with the obstacle can occur, and huge life and property threats are brought to people. Based on this, how to determine whether there is an obstacle in the environment in which the vehicle is located becomes a problem to be solved.

The embodiment of the application provides an obstacle determining method which can be applied to the implementation environment. Taking the flowchart of the obstacle determining method provided by the embodiment of the application shown in fig. 2 as an example, for convenience of description, the in-vehicle terminal 11 or the server that performs the obstacle determining method in the embodiment of the application is referred to as an electronic device, and the method may be performed by the electronic device. As shown in fig. 2, the method includes the following steps.

In step 201, a first frame image and a second frame image are acquired, the first frame image and the second frame image being different images for describing an environment in which a target vehicle is located.

In the embodiment of the application, the target vehicle is any vehicle, the target vehicle is provided with the image acquisition device, and the environment in which the target vehicle is positioned is acquired in real time or periodically or randomly through the image acquisition device, so that an environment image for describing the environment is obtained. The embodiment of the application does not limit the image acquisition device, and the image acquisition device can be a camera, a video camera and the like by way of example.

The environmental image is multi-frame, and the first frame image and the second frame image can be screened from the multi-frame environmental image. It will be appreciated that there are a variety of screening methods. For example, in the implementation A1, any two frame ambient images are respectively taken as the first frame image and the second frame image. Or in the implementation mode A2, for any two adjacent environmental images in the multi-frame environmental images, determining an image error between the two environmental images based on pixel values of each pixel point in the two environmental images, and if the image error is greater than a set threshold, determining that the two environmental images are the first frame image and the second frame image respectively. In this way, the first frame image and the second frame image with larger content difference can be screened out, which is beneficial to determining suddenly appearing obstacles. Alternatively, according to the implementation A3 shown below, the first frame image and the second frame image are screened from the multi-frame environmental image.

In implementation A3, step 201 includes steps 2011 to 2013.

In step 2011, an environmental video acquired by a camera of the target vehicle is acquired, where the environmental video includes multiple frames of environmental images.

In the embodiment of the application, the image acquisition equipment comprises a camera, and the camera is used for acquiring the environment of the target vehicle in real time to obtain the camera video. An environmental video with a video duration of T or an image frame number of T may be extracted from the captured video, the environmental video including a plurality of frame images, and each frame image being an environmental image.

And 2012, carrying out normalization processing on the environment images of each frame to obtain normalized images of each frame.

In the embodiment of the application, any normalization algorithm can be adopted to normalize any frame of environment image to obtain a normalized environment image, and the normalized environment image is also called a normalized image. Among them, normalization algorithms include, but are not limited to, L1 normalization algorithm, L2 normalization algorithm, gray scale normalization algorithm, and the like.

For the L1 normalization algorithm, the sum of the pixel values of each pixel in the environmental image of any frame may be calculated, and the pixel value of any pixel is divided by the sum of the pixel values of each pixel to obtain the pixel value of any pixel after normalization. In this way, the normalized pixel value of each pixel is determined, thereby obtaining a normalized image. That is, the pixel value of each pixel in the normalized image is the pixel value normalized for each pixel.

For the L2 normalization algorithm, the square root of the sum of the pixel values of each pixel in the environmental image of any frame may be calculated, and the pixel value of any pixel may be divided by the square root of the sum of the pixel values of each pixel to obtain the pixel value of any pixel after normalization. In this way, the normalized pixel value of each pixel is determined, thereby obtaining a normalized image. That is, the pixel value of each pixel in the normalized image is the pixel value normalized for each pixel.

For the gradation normalization algorithm, normalization processing can be performed in accordance with steps 20121 to 20123 mentioned below. That is, step 2012 includes steps 20121 to 20123.

And step 20121, performing gray level conversion on the environment images of each frame to obtain gray level images of each frame.

In the embodiment of the present application, any frame of environment image is a color image, and the pixel value of any pixel point in the color image includes a Red (Red, R) channel value, a Green (G) channel value, and a Blue (B) channel value. The gray level conversion can be performed on the environment image based on three channel values of each pixel point in any frame of environment image, so as to obtain a gray level image.

Alternatively, gradation conversion is performed using a maximum value method. That is, for any one pixel point in any one frame of the environment image, the largest channel value is determined from the three channel values of the pixel point, and the channel value is used as the gray value of the pixel point in the gray image, so that the gray image is obtained, and the gray image comprises the gray values of the pixel points. Alternatively, gradation conversion is performed by an average method or a weighted average method. That is, for any one pixel point in any one frame of environment image, the average calculation or weighted average calculation is performed on three channel values of the pixel point, and the calculated channel value is used as the gray value of the pixel point in the gray image, so that the gray image is obtained.

Step 20122, determining an average image based on the gray-scale images of each frame, wherein the gray-scale value of any pixel point on the average image is obtained by averaging the gray-scale values of any pixel point in the gray-scale images of each frame.

In the embodiment of the application, any two frames of gray images comprise the same number of pixel points and the arrangement modes of the pixel points are the same, and based on the same, each frame of gray image comprises the same pixel point, and the gray values of the same pixel point in different gray images are the same or different. For example, each frame of grayscale images includes n×n pixels, which can be regarded as N rows and N columns of pixels, and each frame of grayscale images includes the pixel of the j (j is a positive integer less than or equal to N) th row (i is a positive integer less than or equal to N) column.

For any pixel point, the gray value of the pixel point in each frame gray image is calculated in an average mode, and the calculated gray value is used as the gray value of the pixel point in the average image, so that an average image is obtained.

And step 20123, carrying out normalization processing on each frame of gray level image based on the average image to obtain each frame of normalized image.

In the embodiment of the application, the normalization processing can be performed on any frame of gray level image based on the average image, so as to obtain the gray level image after the normalization processing, and the gray level image after the normalization processing is also called as a normalization image.

Illustratively, step 20123 includes: determining an average gray value of the average image and a gray value variance of the average image based on gray values of all pixel points in the average image; for any frame of gray level image, determining the average gray level value of the gray level image of any frame and the gray level variance of the environment image of any frame based on the gray level value of each pixel point in the gray level image of any frame; and carrying out normalization processing on any frame of gray level image based on the average gray level value of the average image, the gray level value variance of the average image, the average gray level value of any frame of gray level image and the gray level value variance of any frame of gray level image to obtain a normalized image.

In the embodiment of the application, the gray value of each pixel point in the average image can be subjected to average calculation to obtain the average gray value of the average image. Then, for any pixel point in the average image, the square of the difference between the gray value of the pixel point in the average image and the average gray value of the average image is calculated, and for convenience of description, the square is referred to as the square corresponding to the pixel point in the average image. Dividing the sum of squares corresponding to all the pixel points in the average image by the number of the pixel points in the average image to obtain the gray value variance of the average image.

Next, a normalized image is determined according to the following formula (1).

Wherein N (i, j) represents the gray value of the pixel point of the ith row and the jth column in the normalized image. i and j are positive integers. M is M₀ The average gray value of the average image is characterized. V (V)₀ The gray value variance of the average image is characterized. I (I, j) represents the gray value of the pixel point of the ith row and the jth column in any frame gray image. M represents the average gray value of any frame gray image. V represents the gray value variance of any frame gray image.

According to the mode of the formula (1), the gray value of each pixel point in the normalized image corresponding to any frame gray image can be determined, and the normalized image corresponding to the frame gray image is determined. According to the method, the normalized image corresponding to each frame of gray level image can be determined, and a plurality of frames of normalized images are obtained.

And step 2013, screening the first frame image and the second frame image from the normalized images of each frame.

In the embodiment of the application, any two frames of normalized images in each frame of normalized image can be respectively used as a first frame image and a second frame image. Or for any two adjacent normalized images in each normalized image, determining an image error between the two normalized images based on the gray value of each pixel point in the two normalized images, and if the image error is greater than a set threshold, determining that the two normalized images are the first frame image and the second frame image respectively. In this way, the first frame image and the second frame image with larger content difference can be screened out, which is beneficial to determining suddenly appearing obstacles.

Through carrying out gray conversion and normalization processing on each frame of environment image, the contrast of the image is enhanced, so that when the first frame image and the second frame image are screened from each frame of normalization image, the accuracy of the first frame image and the second frame image can be improved, the number of images is reduced, and the determination efficiency of the obstacle is improved.

Step 202, extracting spatial features of the first frame image by the self-encoder to obtain the spatial features of the first frame image, where the spatial features of the first frame image are used to describe each first image block in the first frame image.

In an embodiment of the present application, the first frame image includes a plurality of first image blocks. Any one of the first image blocks comprises a plurality of continuous pixels in the first frame image, and any two adjacent first image blocks can comprise at least one same pixel or not. Spatial features of the first frame image may be obtained by spatial feature extraction of the first frame image from the encoder to describe the respective first image blocks by the spatial features.

The embodiment of the application does not limit the structure, the size, the parameters and the like of the self-encoder. Illustratively, the self-encoder may be a Tensorflow model that uses mainly tf.nn.conv2d functions (a convolution function) for spatial feature extraction, the model comprising an input layer, a hidden layer, and an output layer. Referring to fig. 3, fig. 3 is a schematic structural diagram of a self-encoder according to an embodiment of the present application, where the self-encoder includes an input layer, a hidden layer and an output layer. The input layer is used for receiving input data and transmitting the input data to the hidden layer, the hidden layer is used for carrying out convolution processing on the input data, and the output layer is used for mapping the characteristics output by the hidden layer into the spatial characteristics of the first frame image.

In one possible implementation, step 202 includes steps 2021 to 2022.

Step 2021, for any one of the first image blocks in the first frame image, extracting features of any one of the first image blocks by the self-encoder to obtain first features of any one of the first image blocks.

In the embodiment of the application, the value of any pixel point in the first frame image is a gray value or a pixel value. The updated value of the original pixel is determined by the self-encoder based on the value of the original pixel and the values of surrounding pixels. The original pixel points are at least one, and the first characteristic of the first image block comprises updated values of the original pixel points.

Optionally, for any pixel in the first image block, a weighted sum calculation or a weighted average calculation is performed on the value of the pixel and the values of the pixels around the pixel by the self-encoder, and the calculation result or the sum of the calculation result and the bias term is used as the updated value of the pixel. A first feature of the first image block is determined using the updated value of the at least one pixel. In this way, the first characteristics of the respective first image blocks can be determined.

The self-encoder can be seen as a series of moving windows. The moving window corresponds to a correlation coefficient or weight and the moving window corresponds to a moving step. And moving the moving window onto the first frame image, wherein the area of the first frame image covered by the moving window is a first image block, and determining the first characteristic of the first image block based on the value of each pixel point in the first image block and the corresponding correlation coefficient or weight of the moving window. Then, the moving window is continuously moved on the first frame image according to the moving step length, and the first characteristic of the first image block covered by the moving window is determined until the moving window moves out of the first frame image. From this, the first features of the respective first image blocks can be obtained.

Step 2022 determines spatial features of the first frame image based on the first features of the respective first image blocks.

That is, the spatial features of the first frame image include the first features of the respective first image blocks. Alternatively, each first image block included in the first frame image is denoted as x₁ ,x₂ ,x₃ ,…,x_n After the feature extraction of each first image block by the self-encoder, the spatial features of the first frame image can be obtained, and the spatial features of the first frame image comprise the first features h of each first image block₁ ,h₂ ,h₃ ,…,h_n 。

And 203, extracting the time characteristics of the space characteristics of the first frame image through a long-short-time memory model to obtain the time characteristics of the first frame image, wherein the time characteristics of the first frame image are used for describing the association relation among the first image blocks.

In an embodiment of the present application, the spatial features of the first frame image include first features of respective first image blocks. The first features of each first image block can be subjected to time feature extraction through a Long-short-time memory (Long-Short Term Memory, LSTM) model to obtain the time features of the first frame image, so that the association relationship among each first image block can be described through the time features, and the time features can accurately describe the first frame image from the global angle.

The embodiment of the application does not limit the structure, the size, the parameters and the like of the long-short memory model. Illustratively, the long-short-term memory model includes a plurality of convolutional layers. First features h of respective first image blocks₁ ,h₂ ,h₃ ,…,h_n As input of long-short time memory model, h is passed through multiple convolution layer pairs₁ ,h₂ ,h₃ ,…,h_n And carrying out convolution processing to extract more deep characteristic information. Optionally, the convolution processing is performed by a convolution function, which is constructed based on the first function and the second function. Wherein the convolution function may be expressed as f (x) = ζf (m) ·g (x-m) dm, f (x) characterizing the first function, g (x) characterizing the second function as g function, x being a convolution symbol, ·being a dot-product symbol, ·f (m) ·g (x-m) dm characterizing the integral of f (m) ·g (x-m) over m. The convolution processing is the core of the calculation process of smoothing or sharpening, and can continuously extract and obtain deeper characteristic information.

In one possible implementation, step 203 includes steps 2031 to 2033.

In step 2031, for a first image block, feature extraction is performed on a first feature of the first image block by using the long-short-term memory model, so as to obtain a second feature of the first image.

It has been mentioned above that the first characteristic h of the respective first image block₁ ,h₂ ,h₃ ,…,h_n As input to a long and short term memory model. First feature h for first image block₁ Pair h through long-short-term memory model₁ And performing convolution processing, and extracting deeper feature information to obtain second features of the first image. Wherein h can be determined by the above-mentioned convolution function pair₁ The convolution processing is performed, and will not be described in detail here.

Step 2032, for any one of the first image blocks except the first one, extracting features of the first feature of any one of the first image blocks and the second feature of the previous one of the first image blocks by using the long-short-time memory model to obtain the second feature of any one of the first images.

After obtaining the second feature of the first image, the second feature of the first image and the first feature h of the second first image block may be combined₂ And splicing to obtain splicing characteristics corresponding to the second first image block. And carrying out convolution processing on the spliced features corresponding to the second first image block through the long short-time memory model, and extracting deeper feature information through the convolution processing to obtain second features of the second first image. The convolution processing may be performed on the stitching feature corresponding to the second first image block by using the convolution function mentioned above, which is not described herein.

And by analogy, first, the first features of any one of the first image blocks except the first image block and the second features of the first image block which is the last one of the first image blocks are spliced to obtain the corresponding splicing features of any one of the first image blocks. And then, carrying out convolution processing on the splicing features corresponding to any one of the first image blocks through a long-short-time memory model, and extracting deeper feature information through the convolution processing to obtain second features of the first image.

In step 2033, a temporal feature of the first frame image is determined based on the second feature of the last first image.

By way of steps 2031 to 2032, the second features of the first to last first images can be obtained. The second feature of the last first image may be determined as the temporal feature of the first frame image, or the second features of the first to last first images may be determined as the temporal feature of the first frame image, or the fusion result obtained by fusing the second features of the first to last first images may be determined as the temporal feature of the first frame image.

Step 204, extracting spatial features of the second frame image by the self-encoder to obtain spatial features of the second frame image, where the spatial features of the second frame image are used to describe each second image block in the second frame image.

In an embodiment of the present application, the second frame image includes a plurality of second image blocks. Any one of the second image blocks comprises a plurality of continuous pixels in the second frame image, and any two adjacent second image blocks can comprise at least one same pixel or not. Spatial features of the second frame image may be obtained by spatial feature extraction of the second frame image from the encoder to describe the respective second image blocks by the spatial features. The implementation manner of step 204 may be described in step 202, and the implementation principles of the two are similar, which is not described herein.

And 205, extracting the time characteristics of the space characteristics of the second frame image through a long-short-time memory model to obtain the time characteristics of the second frame image, wherein the time characteristics of the second frame image are used for describing the association relation among the second image blocks.

In an embodiment of the present application, the spatial features of the second frame image include first features of respective second image blocks. The first features of each second image block can be subjected to time feature extraction through a Long-short-time memory (Long-Short Term Memory, LSTM) model to obtain the time features of the second frame image, so that the association relationship among each second image block can be described through the time features, and the time features can accurately describe the second frame image from the global angle. The implementation manner of step 205 may be described in step 203, and the implementation principles of the two are similar, which is not described herein.

Step 206, determining an image error between the first frame image and the second frame image based on the temporal characteristics of the first frame image and the temporal characteristics of the second frame image.

In the embodiment of the application, the feature distance between the time feature of the first frame image and the time feature of the second frame image can be calculated according to a distance algorithm, and the feature distance is used as an image error between the first frame image and the second frame image. The embodiment of the application does not limit the calculation mode of the distance algorithm, and the distance algorithm comprises a cross entropy algorithm, a relative entropy algorithm and the like by way of example.

By calculating the feature distance between the time feature of the first frame image and the time feature of the second frame image, the image error between the first frame image and the second frame image is calculated on the feature level, so that the image error can globally reflect the content difference of the first frame image and the second frame image, and the accuracy of the image error is improved.

In one possible implementation, step 206 includes steps 2061 to 2063.

In step 2061, the temporal feature of the first frame image is decoded to obtain a first reconstructed image.

In the embodiment of the application, the decoder can decode the time characteristic of the first frame image. The decoder comprises a deconvolution layer, and deconvolution processing is carried out on the time characteristics of the first frame image through the deconvolution layer so as to realize image reconstruction based on the time characteristics of the first frame image and obtain a first reconstructed image. The embodiment of the application does not limit the structure, size, parameters and the like of the decoder, and the decoder is illustratively a Tensorflow model, and the model mainly uses tf.nn.conv2d_transfer function (a deconvolution function) for deconvolution processing.

The deconvolution process corresponds to an automatic filling process and a convolution process. In general, an automatic filling (Padding) process is performed on a time feature of a first frame image to obtain a time feature of the first frame image after filling, and then a convolution process is performed on the time feature of the first frame image after filling to obtain a first reconstructed image. In the embodiment of the application, the automatic filling process is to supplement the number 0 to the time characteristic of the first frame image, and the size of the first reconstructed image can be consistent with the size of the first frame image by means of automatic filling process and convolution process.

In step 2062, the temporal feature of the second frame image is decoded to obtain a second reconstructed image.

In the embodiment of the application, the decoder can decode the time characteristic of the second frame image. The implementation principle of step 2062 is similar to that of step 2061, and the description of step 2061 will be omitted here.

Step 2063, determining an image error between the first frame image and the second frame image based on the first reconstructed image and the second reconstructed image.

And subtracting the value of the pixel point in the first reconstructed image from the value of the pixel point in the second reconstructed image for any pixel point to obtain a difference value corresponding to the pixel point. And calculating the sum, average value, variance, standard deviation and the like of the differences corresponding to the pixel points, and taking the calculation result as an image error between the first frame image and the second frame image.

By decoding the temporal features of the first frame image into a first reconstructed image and decoding the temporal features of the second frame image into a second reconstructed image, it is achieved that image errors are calculated on a pixel level based on the first reconstructed image and the second reconstructed image. The time characteristics of the first frame image can describe the first frame image globally, so that the first reconstructed image determined based on the time characteristics of the first frame image can also reflect the content of the first frame image, and similarly, the second reconstructed image can also reflect the content of the second frame image, so that the image error can intuitively represent the content difference between the first frame image and the second frame image, and the accuracy of the image error is improved.

In step 207, in the case where the image error is greater than the error threshold, it is determined that there is an obstacle in the environment where the target vehicle is located.

The embodiment of the application does not limit the determination mode of the error threshold, and the error threshold is a numerical value set according to manual experience or a numerical value obtained through experimental verification by way of example.

If the image error between the first frame image and the second frame image is greater than the error threshold, it is indicated that the content difference between the first frame image and the second frame image is greater and new image content is present. Since the first frame image and the second frame image are both used for describing the environment where the target vehicle is located, new image contents appear, which is equivalent to that the environment where the target vehicle is located appears as an obstacle. That is, if the image error between the first frame image and the second frame image is greater than the error threshold, it is determined that there is an obstacle in the environment in which the target vehicle is located. Conversely, if the image error between the first frame image and the second frame image is not greater than the error threshold, it is indicated that no new image content is present, and it may be determined that the environment in which the target vehicle is located is not newly populated with obstacles.

When it is determined that an obstacle appears in the environment in which the target vehicle is located, the distance between the target vehicle and the obstacle can be acquired. If the distance is greater than the threshold value, prompt information is sent out, and the driver is prompted to pay attention to the obstacle in the environment where the target vehicle is located through the prompt information, wherein the prompt information can be sound or an image. If the distance is smaller than or equal to the threshold value, the target vehicle is automatically braked, so that the collision between the target vehicle and the obstacle is avoided, and the life and property safety of a driving object is prevented from being influenced.

In view of the foregoing, please refer to fig. 4, fig. 4 is a flowchart of detecting an obstacle according to an embodiment of the present application. In the embodiment of the application, the environmental video can be preprocessed to obtain the first frame image and the second frame image. The environmental video includes multiple frames of environmental images, gray level conversion is performed on each frame of environmental image, normalization processing is performed on each frame of environmental image to obtain normalized images corresponding to each frame of environmental image, and then, a first frame of image and a second frame of image are screened from the normalized images corresponding to each frame of environmental image, and the implementation manner of this part of content can be seen in the description of step 201, which is not repeated here.

And extracting spatial features of the first frame image through a self-encoder, extracting temporal features of the first frame image through a long-short-term memory model, obtaining temporal features of the first frame image, and then decoding the temporal features of the first frame image to obtain a first reconstructed image. Similarly, spatial feature extraction is performed on the second frame image through the self-encoder to obtain spatial features of the second frame image, temporal feature extraction is performed on the spatial features of the second frame image through the long-short-term memory model to obtain temporal features of the second frame image, and then decoding processing is performed on the temporal features of the second frame image to obtain a second reconstructed image. This part of the content can be seen from the descriptions of steps 202 to 205, and will not be described here again.

Then, an image error between the first reconstructed image and the second reconstructed image is calculated. And determining an obstacle detection result based on the image error, wherein if the image error is greater than an error threshold, the obstacle detection result is determined to be an obstacle. This part of the content can be seen from the description of step 206 to step 207, and will not be described in detail here.

It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, displayed data, etc.) and signals related to the present application are all authorized by the user or are fully authorized by the parties, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant region. For example, the environmental video, the first frame image, the second frame image, and the like, which are referred to in the present application, are acquired with sufficient authorization.

The method realizes that the image is divided into each image block on the spatial level by carrying out spatial feature extraction on the first frame image and the second frame image, and each image block is described through spatial features. And then, extracting the spatial features of the first frame image and the spatial features of the second frame image, so that the association of each image block is realized, and the association relation among each image block is described through the temporal features, so that the temporal features can describe the image content from the image global, and the image has stronger characterization capability. And then, determining the image error between the first frame image and the second frame image based on the time characteristic of the first frame image and the time characteristic of the second frame image, and determining that the environment where the target vehicle is located has an obstacle under the condition that the image error is larger than an error threshold value, so that different images of the environment where the target vehicle is located are compared globally from the image, and when the difference of the image contents is larger, determining that the environment has the obstacle, thereby improving the determination efficiency and accuracy of the obstacle.

Fig. 5 is a schematic structural diagram of an obstacle determining device according to an embodiment of the present application, where, as shown in fig. 5, the device includes:

an acquisition module 501, configured to acquire a first frame image and a second frame image, where the first frame image and the second frame image are different images for describing an environment in which a target vehicle is located;

the spatial feature extraction module 502 is configured to perform spatial feature extraction on a first frame image by using a self-encoder to obtain spatial features of the first frame image, where the spatial features of the first frame image are used to describe each first image block in the first frame image;

the time feature extraction module 503 is configured to perform time feature extraction on spatial features of the first frame image through a long-short-time memory model, so as to obtain time features of the first frame image, where the time features of the first frame image are used to describe association relationships between the first image blocks;

the spatial feature extraction module 502 is further configured to perform spatial feature extraction on the second frame image by using the self-encoder, so as to obtain spatial features of the second frame image, where the spatial features of the second frame image are used to describe each second image block in the second frame image;

the time feature extraction module 503 is further configured to perform time feature extraction on spatial features of the second frame image through the long-short-time memory model, so as to obtain time features of the second frame image, where the time features of the second frame image are used to describe association relationships between the second image blocks;

A determining module 504, configured to determine an image error between the first frame image and the second frame image based on the temporal feature of the first frame image and the temporal feature of the second frame image;

the determining module 504 is further configured to determine that an obstacle exists in an environment where the target vehicle is located if the image error is greater than the error threshold.

In one possible implementation, the obtaining module 501 is configured to obtain an environmental video collected by a camera of the target vehicle, where the environmental video includes multiple frames of environmental images; carrying out normalization processing on each frame of environment image to obtain each frame of normalized image; and screening the first frame image and the second frame image from the normalized images of each frame.

In a possible implementation manner, the obtaining module 501 is configured to perform gray level conversion on each frame of environmental image to obtain each frame of gray level image; based on each frame of gray level image, determining an average image, wherein the gray level value of any pixel point on the average image is obtained by averaging the gray level value of any pixel point in each frame of gray level image; and carrying out normalization processing on each frame of gray level image based on the average image to obtain each frame of normalized image.

In one possible implementation, the obtaining module 501 is configured to determine an average gray value of the average image and a gray value variance of the average image based on gray values of pixels in the average image; for any frame of gray level image, determining the average gray level value of the gray level image of any frame and the gray level variance of the environment image of any frame based on the gray level value of each pixel point in the gray level image of any frame; and carrying out normalization processing on any frame of gray level image based on the average gray level value of the average image, the gray level value variance of the average image, the average gray level value of any frame of gray level image and the gray level value variance of any frame of gray level image to obtain a normalized image.

In one possible implementation, the first frame image includes a plurality of first image blocks; a spatial feature extraction module 502, configured to perform feature extraction on any one of the first image blocks in the first frame image by using the self-encoder to obtain a first feature of any one of the first image blocks; spatial features of the first frame image are determined based on the first features of the respective first image blocks.

In one possible implementation, the spatial features of the first frame image include first features of a plurality of first image blocks; a temporal feature extraction module 503, configured to perform feature extraction on a first feature of a first image block by using a long-short-term memory model, to obtain a second feature of the first image; for any one of the first image blocks except the first image block, extracting the first characteristics of any one of the first image blocks and the second characteristics of the last one of the first image blocks through a long-short-time memory model to obtain the second characteristics of any one of the first images; a temporal feature of the first frame image is determined based on the second feature of the last first image.

In a possible implementation manner, the determining module 504 is configured to decode the temporal feature of the first frame image to obtain a first reconstructed image; decoding the time characteristics of the second frame image to obtain a second reconstructed image; an image error between the first frame image and the second frame image is determined based on the first reconstructed image and the second reconstructed image.

The device realizes that the image is divided into each image block on the spatial level by carrying out spatial feature extraction on the first frame image and the second frame image, and each image block is described by the spatial feature. And then, extracting the spatial features of the first frame image and the spatial features of the second frame image, so that the association of each image block is realized, and the association relation among each image block is described through the temporal features, so that the temporal features can describe the image content from the image global, and the image has stronger characterization capability. And then, determining the image error between the first frame image and the second frame image based on the time characteristic of the first frame image and the time characteristic of the second frame image, and determining that the environment where the target vehicle is located has an obstacle under the condition that the image error is larger than an error threshold value, so that different images of the environment where the target vehicle is located are compared globally from the image, and when the difference of the image contents is larger, determining that the environment has the obstacle, thereby improving the determination efficiency and accuracy of the obstacle.

It should be understood that, in implementing the functions of the apparatus provided in fig. 5, only the division of the functional modules is illustrated, and in practical application, the functional modules may be allocated to different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.

Fig. 6 shows a block diagram of a vehicle-mounted terminal 600 according to an exemplary embodiment of the present application. The vehicle-mounted terminal 600 includes: a processor 601 and a memory 602.

Processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 601 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 601 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 601 may be integrated with a GPU (Graphics Processing Unit, image processor) for taking care of rendering and rendering of content that the display screen is required to display. In some embodiments, the processor 601 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

The memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 602 is used to store at least one computer program for execution by processor 601 to implement the obstacle determination method provided by the method embodiments of the application.

In some embodiments, the vehicle-mounted terminal 600 may further include: a peripheral interface 603, and at least one peripheral. The processor 601, memory 602, and peripheral interface 603 may be connected by a bus or signal line. The individual peripheral devices may be connected to the peripheral device interface 603 via buses, signal lines or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 604, a display 605, a camera assembly 606, audio circuitry 607, and a power supply 608.

Peripheral interface 603 may be used to connect at least one Input/Output (I/O) related peripheral to processor 601 and memory 602. In some embodiments, the processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 601, memory 602, and peripheral interface 603 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 604 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 604 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 604 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuit 604 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuit 604 may also include NFC (Near Field Communication ) related circuits, which the present application is not limited to.

The display screen 605 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 605 is a touch display, the display 605 also has the ability to collect touch signals at or above the surface of the display 605. The touch signal may be input as a control signal to the processor 601 for processing. At this point, the display 605 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 605 may be one and disposed on the front panel of the in-vehicle terminal 600; in other embodiments, the display screen 605 may be at least two, and disposed on different surfaces of the vehicle-mounted terminal 600 or in a folded design; in other embodiments, the display 605 may be a flexible display disposed on a curved surface or a folded surface of the in-vehicle terminal 600. Even more, the display 605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display 605 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 606 is used to capture images or video. Optionally, the camera assembly 606 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 606 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 607 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 601 for processing, or inputting the electric signals to the radio frequency circuit 604 for voice communication. For the purpose of stereo acquisition or noise reduction, a plurality of microphones may be respectively disposed at different portions of the in-vehicle terminal 600. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 601 or the radio frequency circuit 604 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 607 may also include a headphone jack.

The power supply 608 is used to power the various components in the in-vehicle terminal 600. The power source 608 may be alternating current, direct current, disposable or rechargeable. When the power source 608 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the in-vehicle terminal 600 further includes one or more sensors 609. The one or more sensors 609 include, but are not limited to: acceleration sensor 611, gyroscope sensor 612, pressure sensor 613, optical sensor 614, and proximity sensor 615.

The acceleration sensor 611 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the in-vehicle terminal 600. For example, the acceleration sensor 611 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 601 may control the display screen 605 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 611. The acceleration sensor 611 may also be used for the acquisition of motion data of a game or a user.

The gyro sensor 612 may detect a body direction and a rotation angle of the in-vehicle terminal 600, and the gyro sensor 612 may collect a 3D motion of the user on the in-vehicle terminal 600 in cooperation with the acceleration sensor 611. The processor 601 may implement the following functions based on the data collected by the gyro sensor 612: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 613 may be disposed at a side frame of the in-vehicle terminal 600 and/or at a lower layer of the display screen 605. When the pressure sensor 613 is disposed at a side frame of the in-vehicle terminal 600, a grip signal of the user to the in-vehicle terminal 600 may be detected, and the processor 601 performs a left-right hand recognition or a shortcut operation according to the grip signal collected by the pressure sensor 613. When the pressure sensor 613 is disposed at the lower layer of the display screen 605, the processor 601 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 605. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The optical sensor 614 is used to collect ambient light intensity. In one embodiment, processor 601 may control the display brightness of display 605 based on the intensity of ambient light collected by optical sensor 614. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 605 is turned up; when the ambient light intensity is low, the display brightness of the display screen 605 is turned down. In another embodiment, the processor 601 may also dynamically adjust the shooting parameters of the camera assembly 606 according to the ambient light intensity collected by the optical sensor 614.

The proximity sensor 615, also called a distance sensor, is typically provided on the front panel of the in-vehicle terminal 600. The proximity sensor 615 is used to collect a distance between a user and the front surface of the in-vehicle terminal 600. In one embodiment, when the proximity sensor 615 detects that the distance between the user and the front surface of the in-vehicle terminal 600 gradually decreases, the processor 601 controls the display screen 605 to switch from the bright screen state to the off screen state; when the proximity sensor 615 detects that the distance between the user and the front surface of the in-vehicle terminal 600 gradually increases, the processor 601 controls the display screen 605 to switch from the off-screen state to the on-screen state.

It will be appreciated by those skilled in the art that the structure shown in fig. 6 is not limiting and that the in-vehicle terminal 600 may include more or less components than illustrated, or may combine certain components, or may employ a different arrangement of components.

Fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 700 may have a relatively large difference due to different configurations or performances, and may include one or more processors 701 and one or more memories 702, where the one or more memories 702 store at least one computer program, and the at least one computer program is loaded and executed by the one or more processors 701 to implement the obstacle determining method provided in the foregoing method embodiments, and the processor 701 is a CPU, for example. Of course, the server 700 may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.

In an exemplary embodiment, there is also provided a computer-readable storage medium having stored therein at least one computer program loaded and executed by a processor to cause an electronic device to implement any of the above-described obstacle determining methods.

Alternatively, the above-mentioned computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Read-Only optical disk (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, there is also provided a computer program that is at least one computer program that is loaded and executed by a processor to cause an electronic device to implement any of the above-described obstacle determining methods.

In an exemplary embodiment, there is also provided a computer program product in which at least one computer program is stored, the at least one computer program being loaded and executed by a processor to cause an electronic device to implement any of the above-described obstacle determining methods.

It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

The above embodiments are merely exemplary embodiments of the present application and are not intended to limit the present application, any modifications, equivalent substitutions, improvements, etc. that fall within the principles of the present application should be included in the scope of the present application.

Claims

1. A method of obstacle determination, the method comprising:

2. The method of claim 1, wherein the acquiring the first frame image and the second frame image comprises:

acquiring an environmental video acquired by a camera of a target vehicle, wherein the environmental video comprises a plurality of frames of environmental images;

carrying out normalization processing on each frame of environment image to obtain each frame of normalized image;

and screening the first frame image and the second frame image from the normalized images of each frame.

3. The method according to claim 2, wherein the normalizing the environmental images of each frame to obtain normalized images of each frame includes:

Carrying out gray level conversion on each frame of environment image to obtain each frame of gray level image;

determining an average image based on the gray level images of each frame, wherein the gray level value of any pixel point on the average image is obtained by averaging the gray level value of any pixel point in the gray level images of each frame;

and carrying out normalization processing on the gray level images of each frame based on the average image to obtain normalized images of each frame.

4. A method according to claim 3, wherein normalizing the grayscale images of each frame based on the average image to obtain normalized images of each frame comprises:

determining an average gray value of the average image and a gray value variance of the average image based on gray values of all pixel points in the average image;

for any frame of gray level image, determining the average gray level value of the any frame of gray level image and the gray level value variance of the any frame of environment image based on the gray level value of each pixel point in the any frame of gray level image;

and carrying out normalization processing on the gray level image of any frame based on the average gray level value of the average image, the gray level value variance of the average image, the average gray level value of the gray level image of any frame and the gray level value variance of the gray level image of any frame to obtain a normalized image.

5. The method of claim 1, wherein the first frame image comprises a plurality of first image blocks; the step of extracting the spatial features of the first frame image by the self-encoder to obtain the spatial features of the first frame image includes:

for any one first image block in the first frame image, extracting the characteristics of the any one first image block through a self-encoder to obtain the first characteristics of the any one first image block;

spatial features of the first frame image are determined based on the first features of the respective first image blocks.

6. The method of claim 1, wherein the spatial features of the first frame image comprise first features of a plurality of first image blocks; the step of extracting the time feature of the space feature of the first frame image through the long-short time memory model to obtain the time feature of the first frame image comprises the following steps:

for a first image block, extracting features of a first feature of the first image block through a long-short-time memory model to obtain a second feature of the first image;

for any one of the first image blocks except the first image block, extracting the first features of the any one of the first image blocks and the second features of the first image block above the any one of the first image blocks through the long-short-time memory model to obtain the second features of the any one of the first images;

A temporal feature of the first frame image is determined based on the second feature of the last first image.

7. The method of claim 1, wherein the determining an image error between the first frame image and the second frame image based on the temporal characteristics of the first frame image and the temporal characteristics of the second frame image comprises:

decoding the time characteristics of the first frame image to obtain a first reconstructed image;

decoding the time characteristics of the second frame image to obtain a second reconstructed image;

an image error between the first frame image and the second frame image is determined based on the first reconstructed image and the second reconstructed image.

8. An obstacle determining device, the device comprising:

9. An electronic device comprising a processor and a memory, wherein the memory stores at least one computer program, the at least one computer program being loaded and executed by the processor to cause the electronic device to implement the obstacle determination method of any one of claims 1 to 7.

10. A computer readable storage medium, characterized in that at least one computer program is stored in the computer readable storage medium, which is loaded and executed by a processor to cause an electronic device to implement the obstacle determining method according to any one of claims 1 to 7.