wherein d (u, v) represents the parallax value of the pixel point (u, v) in the parallax image; ε is the decision threshold. The terminal determines a road plane in the vehicle environment image, wherein the parallax image is based on the determination threshold. For the non-road surface point, the pixel point with the delta (u, v) smaller than the judgment threshold value is a pixel point in a road plane or a background such as sky. On the other hand, the pixel points with the Δ (u, v) smaller than the determination threshold are pixel points of non-road surfaces, for example, pixel points on obstacles such as vehicles and signs on the road surface, so that the terminal detects travelable areas such as roads in the vehicle environment image.

In a possible implementation manner, the terminal may first determine, according to a gray value variation condition of a pixel point in the longitudinal parallax image, a parallax value and a longitudinal coordinate range of the target region, and then determine, in the parallax image, a transverse coordinate range of the target region, so as to obtain the target region, where the process may include: the terminal determines the parallax value of a target area in the vehicle environment image according to the gray value of each row of pixel points in the longitudinal parallax image; the terminal determines a longitudinal coordinate range of the target area above the road area according to the parallax value of the target area in the longitudinal parallax image; and the terminal determines the transverse coordinate range of the target area in the parallax image according to the longitudinal coordinate range and the parallax value of the target area. At one isIn a possible example, the terminal may sum gray values of all pixel points included in each column in the V parallax image to obtain a mapping relationship d-s (d) between the column and a summation result, the terminal may establish a rectangular coordinate system with d as a horizontal coordinate and s (d) as a vertical coordinate, and in the rectangular coordinate system, the terminal obtains a plurality of maximum value points d in the mapping relationship based on the mapping relationship_i . The surface of a significant obstacle due to a vehicle, pedestrian, etc. on the road surface is in a near perpendicular relationship to the ground. Therefore, the corresponding straight line of the target object in the V disparity map is approximately perpendicular to the road surface straight line, so that the terminal can use the maximum value point d_i And determining the corresponding abscissa of the target object in the V disparity map. In the V parallax image, the terminal is (d)_i ,f(d_i ) In the upper area of the road surface area, a vertical line segment perpendicular to the road surface line is searched:

the vertical line segment is a pixel point corresponding to the target object in the V parallax image, and the vertical line segment is

The two middle elements are the upper and lower boundaries of the target object along the longitudinal coordinate axis and the average parallax of the target object respectively. After the terminal determines the longitudinal coordinate range of the target object in the V parallax image, the terminal may search in the parallax image along the transverse direction in the longitudinal coordinate range to determine that the longitudinal coordinate interval is (V)_ui ,v_di ) In the area (d), the parallax value satisfies the pixel point of the target condition, which may be the average parallax value d between the parallax value and the maximum value point_i The difference between them is less than the target threshold value, so that the terminal determines that most of the parallax in the lateral direction is close to d_i Thereby determining the region position of the target object from the lateral direction and the longitudinal direction, respectively.

202. And the terminal performs object segmentation on the target area to obtain target semantic information of the target object.

Wherein the target semantic information includes an object class of the target object and an initial contour of the target object; in one possible implementation, the terminal may first acquire approximate region positions of a plurality of target objects by using a target detection method, and then segment each target object from the approximate region positions of the plurality of target objects by using semantic segmentation, instance segmentation, or the like. The process may include: the terminal identifies an object area of at least one object in the object area, wherein the object area is located in the object area; and the terminal determines the object type of each target object and the initial contour of each target object according to the pixel values of a plurality of pixel points in the object region. In one possible example, the terminal may implement object segmentation using an instance segmentation algorithm. For example, the terminal segments the target region using a MaskR-CNN (mask regions with a Convolutional neural network Features, mask based region) algorithm.

In a possible implementation manner, the terminal may directly segment the target region in the vehicle environment image by using an instance segmentation algorithm, so as to obtain the target semantic information of the target object. Alternatively, the terminal may also directly segment the vehicle environment image by using an example segmentation algorithm, and adjust the segmentation result based on the target area obtained instep 201 to further obtain the target semantic information of the target object. The embodiment of the present invention is not particularly limited to this.

It should be noted that the Mask R-CNN algorithm is based on deep learning, wherein the backbone network uses a classical 50-layer deep residual network ResNet 50. The Mask R-CNN algorithm can effectively detect the target object in a single RGB image and carry out instance segmentation to obtain the category of the target object and carry out pixel-by-pixel segmentation, and can accurately and efficiently segment the initial contour of the target object and identify the category of the object, so that the target object can be further described more accurately based on the semantic aspect, and the accuracy of subsequently determining the contour of the target object is further improved.

203. And the terminal determines the outline of the target object according to the parallax image and the target semantic information of the target object.

In the embodiment of the invention, the terminal can further optimize the more accurate contour of the target object based on the projection of the three-dimensional solid model of the target object on the horizontal plane. This step may be accomplished by the following steps 2031-2033.

2031. And the terminal adjusts the initial contour of the target object based on the boundary pixel point of the initial contour of the target object corresponding to the parallax image.

In a possible implementation manner, because the disparity value of the pixel point is changed greatly from the inside of the contour edge of the target object to the outside of the contour edge in the disparity image, the terminal can correct the initial contour according to the disparity change condition of the boundary pixel point corresponding to the initial contour in the disparity image. The process may include: the terminal can determine a plurality of boundary pixel points corresponding to the initial contour of the target object in the parallax image; the terminal can adjust a plurality of boundary pixel points corresponding to the initial contour of the target object according to the change degree of the parallax values of a plurality of neighborhood pixel points of each boundary pixel point. When the change degree of the parallax values of a plurality of neighborhood pixels of the boundary pixel meets a target mutation condition, the terminal can reserve the boundary pixel; when the change degree of the parallax values of the plurality of neighborhood pixels of the boundary pixel does not satisfy the target mutation condition, the terminal can replace the boundary pixel with a pixel satisfying the target mutation condition among the plurality of neighborhood pixels. The target mutation condition may be that a difference between a maximum parallax value and a minimum parallax value of the plurality of neighborhood pixels is greater than a target threshold.

It should be noted that the terminal can locate the initial contour in the parallax image according to the position of the pixel point of the initial contour in the vehicle environment image, so as to independently extract the contours of different object categories segmented by the object for further optimization, and specifically, the terminal searches in the neighborhood of each segmented boundary pixel point based on the target mutation condition, so as to locate the pixel point with the mutation of the parallax value in the neighborhood as the boundary pixel point of the target object, thereby further accurately correcting the contour of the target object, and improving the accuracy and precision of the initial contour.

2032. And the terminal constructs a three-dimensional model of the target object based on the adjusted initial contour and the calibration parameters of the image acquisition equipment.

Wherein, the image acquisition equipment is used for acquiring the vehicle environment image.

The terminal can construct a corresponding point set of the target object in a three-dimensional vehicle coordinate system. The vehicle coordinate system may be an XYZ rectangular space coordinate system, wherein the positive X-axis direction may be a horizontal rightward direction perpendicular to the vehicle traveling direction, the positive Y-axis direction may be the vehicle traveling direction, and the positive Z-axis direction may be a vertical upward direction. The terminal obtains the area position of the target object in the vehicle environment image according to the adjusted initial contour, and solves the three-dimensional space coordinate of the target object in the vehicle coordinate system based on the parallax value and the camera calibration parameter corresponding to the pixel point in the area position, so that a space model for simulating the target object is constructed.

In one possible example, the terminal may also determine the position of the road plane in three-dimensional space based on the disparity of the road plane. For example, the terminal may determine a three-dimensional space coordinate of each pixel point in the road plane in the three-dimensional vehicle coordinate system according to the parallax value of each pixel point in the road plane and the camera calibration parameter, and further, the terminal may determine the road surface height of the road area based on the three-dimensional space coordinate of the road area corresponding to the three-dimensional vehicle coordinate system.

2033. And the terminal adjusts the projection area of the three-dimensional model on the horizontal plane according to the depth of the three-dimensional model of the target object to obtain the outline of the target object on the horizontal plane.

The projection region may be a region where the terminal projects the three-dimensional stereo model on a horizontal plane, that is, an XOY plane. The depth of the three-dimensional solid model refers to the length of the three-dimensional solid model in the vehicle traveling direction, that is, the length of the three-dimensional solid model in the Y-axis direction. In the parallax image, there may be confusion between pixel values of pixel points at the contour boundary of the target object and between the background and the road surface. In this step, the terminal may further optimize the contour of the target object in the Y-axis direction.

In a possible implementation manner, the terminal may find out pixel points possibly confused between the outline of the target object and the surrounding environment based on the coordinates of the target object in the Y-axis direction. In a possible example, the terminal determines an average coordinate value of a plurality of Y coordinates according to the Y coordinates of a plurality of pixel points included in the target object in the Y axis direction, and the terminal reserves a plurality of target pixel points of which the Y coordinates belong to a target range among the plurality of pixel points according to the average coordinate value, for example, the terminal may only reserve pixel points of which the Y coordinates belong to an interval [ m-3 σ, m +3 σ ]. Wherein m represents an average coordinate value of the Y coordinates of the plurality of pixel points in the Y-axis direction, and σ is a standard deviation of the Y coordinates of the plurality of pixel points in the Y-axis direction. Then, the terminal converts the target pixels and the areas where the target pixels are located into a 0-1 matrix, performs convolution operation on the matrix by using a target convolution kernel, for example, the target convolution kernel may be a full "1" convolution kernel with a certain size, the terminal may only retain coordinates of target pixels which are "1" before convolution and are larger than a target threshold after convolution, then the terminal converts the matrix, and converts the target pixels in the matrix into a point set on a projected XOY plane, thereby further correcting the projection area.

It should be noted that the terminal can correct the pixel points which may be confused between the contour of the target object and the surrounding environment by using the Y coordinate in the Y axis direction, and delete the abnormal points which are expressed as isolated outliers in the point cloud of the target object through the Y coordinate screening process and the convolution process in the target range, thereby further optimizing the contour of the target object in the Y axis direction and improving the accuracy of target detection.

204. The terminal determines the spatial position of the target object in the vehicle environment according to the road area and the contour of the target object.

In this step, the terminal may determine the position of the target object in the surrounding environment relative to the road surface based on the height of the road surface, for example, 3 meters above the road surface ahead, so as to more directly give the effective spatial position of the target object. The terminal can determine the minimum bounding rectangle of the target object in the horizontal plane based on the outline of the target object; the terminal determines the plane size of the target object based on the position coordinate of the minimum circumscribed rectangle in a vehicle coordinate system; the terminal determines a relative height of the target object on the road surface based on the road surface height of the road region. When the terminal determines the minimum external rectangle of the target object, the terminal can include all pixel points of the target object in the minimum external rectangle, and the minimum rectangle including all the pixel points is used as a detection frame of the target object, so that the two-dimensional plane position of the target object on the horizontal plane is determined. And after the terminal determines the minimum circumscribed rectangle, the size of the minimum circumscribed rectangle can be further adjusted based on the actual size of the target object in the real physical world. In addition, the terminal may determine the height of the target object according to the height of the target object in the vertical direction and the height of the road surface, for example, the terminal may determine the height of the target object according to the maximum coordinate value and the minimum coordinate value of the target object in the Z-axis direction, and determine the relative height of the target object with respect to the road surface according to the height coordinate of the road surface in the Z-axis direction. This terminal can also be according to this relative height, minimum external rectangle, determine the accurate three-dimensional space coordinate of this target object in vehicle coordinate system, it is further, this terminal can also be based on this three-dimensional space coordinate, with the target object projection to two-dimensional plane on, for example, projection to two-dimensional horizontal plane, of course, this terminal also can show the accurate projection on this two-dimensional horizontal plane on the vehicle-mounted terminal screen of vehicle, so that the user browses, for example, the check-up of tester among the vehicle automatic driving test process has been made things convenient for.

For example, taking an adjacent vehicle with a target object as the own vehicle as an example, the terminal finds corresponding pixel points of the adjacent vehicle in the parallax image, calculates corresponding three-dimensional coordinate points of the pixel points in the vehicle coordinate system by combining the camera calibration parameters, and then projects the three-dimensional coordinate points representing the adjacent vehicle on the XOY plane. Generally, the terminal can derive the location and orientation of the neighboring vehicles based on the distribution in the projection area. And then the terminal further optimizes the projection area of the target object in the Y-axis direction, and after the isolated points are removed, the minimum circumscribed rectangle of the projection area is detected to obtain the position of the adjacent vehicle on the ground. The terminal determines the height of the adjacent vehicle relative to the road according to the maximum coordinate value and the minimum coordinate value of the adjacent vehicle in the Z-axis direction in the vehicle coordinate system and the height of the road.

In order to more clearly describe the flow of the embodiment of the present invention, the overall flow of the above steps 201-204 is described below with the frame diagram shown in fig. 3 and the target detection flow diagram shown in fig. 4. As shown in fig. 3, the terminal may be configured with a rough segmentation module for road regions and target regions, and configured to segment the target regions and the road regions in the vehicle environment image according to the V parallax image, so as to implement rough segmentation of different functional regions; the terminal can also be provided with a three-dimensional target detection module fused with semantic information, and the three-dimensional module detection module is used for carrying out semantic instance segmentation on the vehicle environment image and further carrying out correction and optimization on the basis of an initial contour obtained by segmentation. As shown in fig. 4, the terminal acquires a parallax image, performs object segmentation on a target region to obtain target semantic information, corrects an initial contour based on the parallax image and the target semantic information, constructs a three-dimensional model of the target object based on camera internal and external parameters and the corrected initial contour, projects the three-dimensional model onto a horizontal plane, and performs an optimization process of removing outliers on the projection region based on a Y coordinate in the Y axis direction again, thereby obtaining a contour with higher accuracy. The terminal determines the minimum circumscribed rectangle of the target object, and after the length and the width of the circumscribed rectangle are adjusted based on the actual size, the terminal determines the accurate three-dimensional space coordinate of the target object in the vehicle coordinate system based on the road height and the accurate projection on the horizontal plane, and certainly, the terminal can further perform two-dimensional projection on the target object.

According to the method provided by the embodiment of the invention, the target region and the road region in the vehicle environment image are determined, and the target region is subjected to object segmentation, so that the target semantic information comprising the object type and the initial contour of the target object is obtained, the target object is comprehensively described from multiple angles, then the contour of the target object is further accurately positioned according to the parallax image and the target semantic information, the accuracy of the contour is greatly improved, and finally the spatial position of the target object is accurately determined according to the road region and the accurately positioned contour.

Fig. 5 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present invention. Referring to fig. 5, the apparatus includes:

a determiningmodule 501, configured to determine a target area and a road area in a vehicle environment image based on a parallax image of the vehicle environment image during vehicle driving, where the target area includes a target object;

asegmentation module 502, configured to perform object segmentation on the target region to obtain target semantic information of the target object, where the target semantic information includes an object class of the target object and an initial contour of the target object;

the determiningmodule 501 is further configured to determine a contour of the target object according to the parallax image and the target semantic information of the target object;

the determiningmodule 501 is further configured to determine a spatial position of the target object in the vehicle environment according to the road area and the contour of the target object.

In a possible implementation manner, the determiningmodule 501 is further configured to adjust the initial contour of the target object based on a boundary pixel point of the initial contour of the target object corresponding to the parallax image; constructing a three-dimensional model of the target object based on the adjusted initial contour and calibration parameters of image acquisition equipment, wherein the image acquisition equipment is used for acquiring the vehicle environment image; and adjusting the projection area of the three-dimensional model on the horizontal plane according to the depth of the three-dimensional model of the target object to obtain the outline of the target object on the horizontal plane.

In a possible implementation manner, the determiningmodule 501 is further configured to determine a plurality of boundary pixel points corresponding to the initial contour of the target object in the parallax image; and adjusting the plurality of boundary pixel points corresponding to the initial contour of the target object according to the change degree of the parallax values of the plurality of neighborhood pixel points of each boundary pixel point.

In a possible implementation manner, the determiningmodule 501 is further configured to, when the parallax variation degrees of the plurality of neighborhood pixels of the boundary pixel satisfy a target mutation condition, retain the boundary pixel; and when the change degree of the parallax values of the plurality of neighborhood pixels of the boundary pixel does not meet the target mutation condition, replacing the boundary pixel with a pixel meeting the target mutation condition in the plurality of neighborhood pixels.

In a possible implementation manner, the determiningmodule 501 is further configured to determine a parallax image of the vehicle environment image based on at least two frames of vehicle environment images during the vehicle driving; projecting the parallax image along the vertical coordinate of the image coordinate system of the parallax image, and determining a longitudinal parallax image of the vehicle environment image, wherein the gray value of pixel points in the longitudinal parallax image is used for indicating the parallax distribution of each row of pixel points in the parallax image; and determining a target area and a road area in the vehicle environment image according to the gray value of each pixel point in the longitudinal parallax image and the parallax value of each pixel point in the parallax image.

In a possible implementation manner, the determiningmodule 501 is further configured to determine a road surface straight line in an image coordinate system of the longitudinal parallax image according to a gray value of each pixel in the longitudinal parallax image; determining a road area in the vehicle environment image based on the road surface straight line and the parallax value of the pixel point in the parallax image; determining a parallax value of a target area in the vehicle environment image according to the gray value of each row of pixel points in the longitudinal parallax image; in the longitudinal parallax image, determining a longitudinal coordinate range of the target area above the road area according to the parallax value of the target area; and determining the transverse coordinate range of the target area in the parallax image according to the longitudinal coordinate range and the parallax value of the target area.

In a possible implementation manner, thesegmentation module 502 is further configured to identify an object region of at least one target object in the target region; and determining the object type of each target object and the initial contour of each target object according to the pixel values of a plurality of pixel points in the object region.

In a possible implementation manner, the determiningmodule 501 is further configured to determine a minimum bounding rectangle of the target object in a horizontal plane based on the contour of the target object; determining the plane size of the target object based on the position coordinates of the minimum circumscribed rectangle in a vehicle coordinate system; based on the road surface height of the road region, a relative height of the target object on the road surface is determined.

All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

It should be noted that: in the object detection apparatus provided in the foregoing embodiment, only the division of the functional modules is illustrated in the foregoing, and in practical applications, the functions may be distributed by different functional modules as needed, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the functions described above. In addition, the target detection apparatus and the target detection method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.

Fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present invention. The terminal 600 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. The terminal 600 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

In general, the terminal 600 includes: aprocessor 601 and amemory 602.

Processor 601 may include one or more processing cores, such as 4-core processors, 8-core processors, and so forth. Theprocessor 601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). Theprocessor 601 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, theprocessor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments,processor 601 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 602 may include one or more computer-readable storage media, which may be non-transitory.Memory 602 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium inmemory 602 is used to store at least one instruction for execution byprocessor 601 to implement the target detection method provided by the method embodiments herein.

In some embodiments, the terminal 600 may further optionally include: aperipheral interface 603 and at least one peripheral. Theprocessor 601,memory 602, andperipheral interface 603 may be connected by buses or signal lines. Various peripheral devices may be connected to theperipheral interface 603 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of aradio frequency circuit 604, atouch screen display 605, acamera 606, anaudio circuit 607, apositioning component 608, and apower supply 609.

Theperipheral interface 603 may be used to connect at least one peripheral related to I/O (Input/Output) to theprocessor 601 and thememory 602. In some embodiments, theprocessor 601,memory 602, andperipheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, any one or two of theprocessor 601, thememory 602, and theperipheral interface 603 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

TheRadio Frequency circuit 604 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. Theradio frequency circuitry 604 communicates with communication networks and other communication devices via electromagnetic signals. Therf circuit 604 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, theradio frequency circuit 604 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. Theradio frequency circuit 604 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, therf circuit 604 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

Thedisplay 605 is used to display a UI (user interface). The UI may include graphics, text, icons, video, and any combination thereof. When thedisplay screen 605 is a touch display screen, thedisplay screen 605 also has the ability to capture touch signals on or over the surface of thedisplay screen 605. The touch signal may be input to theprocessor 601 as a control signal for processing. At this point, thedisplay 605 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, thedisplay 605 may be one, providing the front panel of the terminal 600; in other embodiments, thedisplay 605 may be at least two, respectively disposed on different surfaces of the terminal 600 or in a folded design; in still other embodiments, thedisplay 605 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 600. Even more, thedisplay 605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. TheDisplay 605 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.

Thecamera assembly 606 is used to capture images or video. Optionally,camera assembly 606 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments,camera assembly 606 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Audio circuitry 607 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to theprocessor 601 for processing or inputting the electric signals to theradio frequency circuit 604 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 600. The microphone may also be an array microphone or an omni-directional acquisition microphone. The speaker is used to convert electrical signals from theprocessor 601 or theradio frequency circuit 604 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments,audio circuitry 607 may also include a headphone jack.

Thepositioning component 608 is used for positioning the current geographic Location of the terminal 600 to implement navigation or LBS (Location Based Service). ThePositioning component 608 may be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, the grignard System in russia, or the galileo System in the european union.

Apower supply 609 is used to supply power to the various components interminal 600. Thepower supply 609 may be ac, dc, disposable or rechargeable. When thepower supply 609 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 600 also includes one or more sensors 610. The one or more sensors 610 include, but are not limited to: acceleration sensor 611, gyro sensor 612, pressure sensor 613,fingerprint sensor 614,optical sensor 615, and proximity sensor 616.

The acceleration sensor 611 may detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 600. For example, the acceleration sensor 611 may be used to detect components of the gravitational acceleration in three coordinate axes. Theprocessor 601 may control thetouch screen display 605 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 611. The acceleration sensor 611 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 612 may detect a body direction and a rotation angle of the terminal 600, and the gyro sensor 612 and the acceleration sensor 611 may cooperate to acquire a 3D motion of the user on theterminal 600. Theprocessor 601 may implement the following functions according to the data collected by the gyro sensor 612: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensor 613 may be disposed on a side frame of the terminal 600 and/or on a lower layer of thetouch display screen 605. When the pressure sensor 613 is disposed on the side frame of the terminal 600, a user's holding signal of the terminal 600 can be detected, and theprocessor 601 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 613. When the pressure sensor 613 is arranged at the lower layer of thetouch display screen 605, theprocessor 601 controls the operability control on the UI interface according to the pressure operation of the user on thetouch display screen 605. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

Thefingerprint sensor 614 is used for collecting a fingerprint of a user, and theprocessor 601 identifies the identity of the user according to the fingerprint collected by thefingerprint sensor 614, or thefingerprint sensor 614 identifies the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, theprocessor 601 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. Thefingerprint sensor 614 may be disposed on the front, back, or side of the terminal 600. When a physical button or vendor Logo is provided on the terminal 600, thefingerprint sensor 614 may be integrated with the physical button or vendor Logo.

Theoptical sensor 615 is used to collect the ambient light intensity. In one embodiment,processor 601 may control the display brightness oftouch display 605 based on the ambient light intensity collected byoptical sensor 615. Specifically, when the ambient light intensity is high, the display brightness of thetouch display screen 605 is increased; when the ambient light intensity is low, the display brightness of thetouch display screen 605 is turned down. In another embodiment, theprocessor 601 may also dynamically adjust the shooting parameters of thecamera assembly 606 according to the ambient light intensity collected by theoptical sensor 615.

A proximity sensor 616, also known as a distance sensor, is typically provided on the front panel of the terminal 600. The proximity sensor 616 is used to collect the distance between the user and the front surface of the terminal 600. In one embodiment, when the proximity sensor 616 detects that the distance between the user and the front surface of the terminal 600 gradually decreases, theprocessor 601 controls thetouch display 605 to switch from the bright screen state to the dark screen state; when the proximity sensor 616 detects that the distance between the user and the front surface of the terminal 600 gradually becomes larger, theprocessor 601 controls thetouch display 605 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 6 is not intended to be limiting ofterminal 600 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

Fig. 7 is a schematic structural diagram of aserver 700 according to an embodiment of the present invention, where theserver 700 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 701 and one ormore memories 702, where thememory 702 stores at least one instruction, and the at least one instruction is loaded and executed by theprocessor 701 to implement the target detection method provided by each method embodiment described above. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory including instructions executable by a processor in a terminal or a server to perform the object detection method in the above embodiments, is also provided. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method of object detection, the method comprising:

projecting the parallax image along a vertical coordinate of an image coordinate system of the parallax image, and determining a longitudinal parallax image of the vehicle environment image, wherein gray values of pixel points in the longitudinal parallax image are used for indicating the parallax distribution of each row of pixel points in the parallax image;

determining a target area in the vehicle environment image according to the gray value of each pixel point in the longitudinal parallax image and the parallax value of each pixel point in the parallax image, wherein the target area comprises a target object;

2. The method of claim 1, wherein determining the contour of the target object according to the disparity image and target semantic information of the target object comprises:

3. The method according to claim 2, wherein the adjusting the initial contour of the target object based on the boundary pixel point corresponding to the initial contour of the target object in the parallax image comprises:

4. The method of claim 3, wherein the adjusting the plurality of boundary pixels corresponding to the initial contour of the target object according to the variation degree of the parallax values of the plurality of neighborhood pixels of each boundary pixel comprises:

5. The method of claim 1, wherein the determining the target region in the vehicle environment image according to the gray value of each pixel point in the longitudinal parallax image and the parallax value of each pixel point in the parallax image comprises:

6. The method of claim 1, wherein the performing object segmentation on the target region to obtain target semantic information of a target object comprises:

7. The method of claim 1, wherein determining the spatial location of the target object in the vehicle environment based on the road region and the contour of the target object comprises:

8. An object detection apparatus, characterized in that the apparatus comprises:

the vehicle environment image processing device comprises a determining module, a processing module and a processing module, wherein the determining module is used for determining a parallax image of a vehicle environment image based on at least two frames of vehicle environment images in the driving process of a vehicle; projecting the parallax image along a vertical coordinate of an image coordinate system of the parallax image, and determining a longitudinal parallax image of the vehicle environment image, wherein gray values of pixel points in the longitudinal parallax image are used for indicating the parallax distribution of each row of pixel points in the parallax image; determining a target area in the vehicle environment image according to the gray value of each pixel point in the longitudinal parallax image and the parallax value of each pixel point in the parallax image, wherein the target area comprises a target object;

the determining module is further configured to determine a road surface straight line in an image coordinate system of the longitudinal parallax image according to the gray value of each pixel point in the longitudinal parallax image; determining a road area in the vehicle environment image based on the road surface straight line and the parallax value of the pixel point in the parallax image;

9. The apparatus of claim 8,

the determining module is further configured to adjust the initial contour of the target object based on a boundary pixel point of the initial contour of the target object corresponding to the parallax image; constructing a three-dimensional model of the target object based on the adjusted initial contour and calibration parameters of image acquisition equipment, wherein the image acquisition equipment is used for acquiring the vehicle environment image; and adjusting the projection area of the three-dimensional model on the horizontal plane according to the depth of the three-dimensional model of the target object to obtain the contour of the target object on the horizontal plane.

10. The apparatus of claim 9,

the determining module is further configured to determine a plurality of boundary pixel points corresponding to the initial contour of the target object in the parallax image; and adjusting the plurality of boundary pixel points corresponding to the initial contour of the target object according to the change degree of the parallax values of the plurality of neighborhood pixel points of each boundary pixel point.

11. The apparatus of claim 10,

the determining module is further configured to retain the boundary pixel points when the parallax value variation degrees of the plurality of neighborhood pixel points of the boundary pixel points satisfy a target mutation condition; and when the change degree of the parallax values of the neighborhood pixels of the boundary pixel does not meet the target mutation condition, replacing the boundary pixel with the pixel meeting the target mutation condition in the neighborhood pixels.

12. The apparatus of claim 8,

the determining module is further configured to determine a disparity value of a target area in the vehicle environment image according to a gray value of each column of pixel points in the longitudinal disparity image; determining a longitudinal coordinate range of the target area above the road area according to the parallax value of the target area in the longitudinal parallax image; and determining the transverse coordinate range of the target area in the parallax image according to the longitudinal coordinate range and the parallax value of the target area.

13. The apparatus of claim 8,

the segmentation module is further used for identifying an object region of at least one object in the object region, wherein the object region is in the object region; and determining the object type of each target object and the initial contour of each target object according to the pixel values of a plurality of pixel points in the object region.

14. The apparatus of claim 8,

the determining module is further configured to determine a minimum bounding rectangle of the target object in a horizontal plane based on the contour of the target object; determining the plane size of the target object based on the position coordinates of the minimum bounding rectangle in a vehicle coordinate system; determining a relative height of the target object on the road surface based on a road surface height of the road region.

15. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction that is loaded and executed by the processor to perform operations performed by the object detection method of any one of claims 1 to 7.

16. A computer-readable storage medium having stored therein at least one instruction which is loaded and executed by a processor to perform operations performed by the object detection method of any one of claims 1 to 7.