CN109215067A

Movatterモバイル変換

Info

Publication number: CN109215067A
Application number: CN201810695220.9A
Authority: CN
Inventors: 黄玉; 郑先廷; 朱俊; 张伟德
Original assignee: Baidu USA LLC
Current assignee: Baidu USA LLC
Priority date: 2017-07-03
Filing date: 2018-06-29
Publication date: 2019-01-15
Anticipated expiration: 2038-06-29
Also published as: CN109215067B

Abstract

In one embodiment, a kind of method or system are from the 3-D point cloud of low resolution and the high-resolution 3-D point cloud of image generation of cameras capture, to operate automatic driving vehicle (ADV).The system receives the first image for driving environment, by cameras capture.The system receives the second image, which indicates the first depth map of first cloud corresponding with driving environment.The system passes through by determining the second depth map to the first image application convolutional neural networks model.The system passes through by generating third depth map to the first image, the second image and the second depth map application conditions random field model, the third depth map has higher resolution ratio than the first depth map, so that third depth map indicates the second point cloud of the driving environment around perception ADV.

Description

High-resolution 3-D point cloud is generated based on CNN and CRF model

Technical field

Embodiment of the present disclosure relates in general to operation automatic driving vehicle.More specifically, embodiment of the present disclosure relates toAnd high-resolution three-dimension (3-D) is generated based on convolutional neural networks (CNN) and conditional random fields (CRF) model and puts cloud.

Background technique

It can be by occupant, especially driver from some with the vehicle of automatic driving mode operation (for example, unmanned)It drives in relevant responsibility and frees.When being run with automatic driving mode, vehicle can be used onboard sensor and navigate toEach position, thus allow vehicle in the case where minimum human-computer interaction or no any passenger some cases downward driving.

High-resolution LIDAR data to realize for automatic driving vehicle (ADV) application (such as, Object Segmentation, detection,Tracking and classification) real-time 3-D scene rebuilding it is critically important.However, high-resolution LIDAR device is usually expensive, and notIt must can obtain.

Summary of the invention

In the one side of the disclosure, provide a kind of for operating the method implemented by computer of automatic driving vehicle, instituteThe method of stating includes:

The first image by the first cameras capture is received, the first image captures the driving of the automatic driving vehicleA part of environment；

The second image is received, second image indicates being generated by laser radar apparatus, with the driving environment oneFirst depth map of corresponding first cloud in part；

By determining the second depth map to the first image application convolutional neural networks model；And

By to the first image, second image and the second depth map application conditions random field model next lifeAt third depth map, the third depth map has higher resolution ratio than first depth map, wherein the third depthFigure indicates the second point cloud for perceiving the driving environment around the automatic driving vehicle.

In another aspect of the present disclosure, a kind of non-transitory machine readable media for being stored with instruction, the finger are providedOrder causes the processor to execute operation when executed by the processor, and the operation includes:

In the disclosure in another aspect, providing a kind of data processing system, comprising:

Processor；And

Memory, the memory are attached to the processor, and with store instruction, described instruction is held by the processorThe processor is caused to execute operation when row, the operation includes:

Detailed description of the invention

Embodiment of the present disclosure is shown in each figure of attached drawing in mode for example and not limitation, the identical ginseng in attached drawingExamine digital indication similar components.

Fig. 1 is the block diagram for showing networked system according to one embodiment.

Fig. 2 is the exemplary block diagram for showing automatic driving vehicle according to one embodiment.

Fig. 3 is to show showing for the perception being used together with automatic driving vehicle and planning system according to one embodimentThe block diagram of example.

Fig. 4 is the high-resolution point cloud module being used together with automatic driving vehicle shown according to one embodimentExemplary block diagram.

Fig. 5 A is the diagram for showing exemplary ADV according to one embodiment.

Fig. 5 B and Fig. 5 C are shown to be taken the photograph according to the LIDAR/ panorama of some embodiments being used together with automatic driving vehicleThe top view and side view of camera configuration.

Fig. 5 D to Fig. 5 F shows the monochrome according to some embodiments/stereoscopic full views camera configuration example.

The flow chart of reasoning pattern and training mode according to one embodiment is shown respectively in Fig. 6 A and Fig. 6 B.

The flow chart of reasoning pattern and training mode according to one embodiment is shown respectively in Fig. 6 C and Fig. 6 D.

Fig. 7 A and Fig. 7 B are to show the exemplary block diagram generated according to the depth map of some embodiments.

Fig. 8 is the diagram for showing the shrinkage layer and expansion layer of convolutional neural networks model according to one embodiment.

Fig. 9 A and Fig. 9 B are to show the exemplary block diagram generated according to the high-resolution depth graph of some embodiments.

Figure 10 is the flow chart for showing method according to one embodiment.

Figure 11 A and Figure 11 B are to show the exemplary block diagram generated according to the depth map of some embodiments.

Figure 12 is the contraction (for example, encoder/convolution) for showing convolutional neural networks model according to one embodimentThe diagram of layer and expansion (for example, decoder/deconvolution (deconvolutional)) layer.

Figure 13 is the flow chart for showing method according to one embodiment.

Figure 14 A and Figure 14 B are to show the exemplary block diagram generated according to the depth map of some embodiments.

Figure 15 is the flow chart for showing method according to one embodiment.

Figure 16 is the block diagram for showing data processing system according to one embodiment.

Specific embodiment

The various embodiments and aspect of the disclosure will be described with reference to details as discussed below, attached drawing will show describedVarious embodiments.Following description and attached drawing are illustrative of the present disclosure, and are not construed as the limitation disclosure.It describes perhapsMore specific details are to provide the comprehensive understanding to the various embodiments of the disclosure.However, in some cases, description is not manyWell known or conventional details is to provide the succinct discussion to embodiment of the present disclosure.

Referring to " embodiment "or" embodiment " means that the embodiment is combined to be retouched in this specificationThe a particular feature, structure, or characteristic stated may include at least one embodiment of the disclosure.Phrase is " in an embodiment partyIn formula " appearance in each in the present specification place need not all refer to same embodiment.

According to some embodiments, the image of 3-D the point cloud and cameras capture of a kind of method or system from low resolutionHigh-resolution 3-D point cloud is generated, to operate automatic driving vehicle (ADV).It will be low using machine learning (deep learning) technologyResolution ratio LIDAR unit and the multi-camera system of calibration combine, to realize that the equivalent high-resolution LIDAR of function is mono-Member, to generate 3-D point cloud.Multi-camera system is designed to that output wide-angle (for example, 360 degree) is monochromatic or bulky color is (for example, redColor, green and blue or RGB) panoramic picture.Then, the end-to-end deep neural network of service-strong data training, and be based onOff-line calibration parameter, using the end-to-end deep neural network, to believe from the input for including wide-angle monochrome or stereoscopic panoramic imageIt number realizes wide-angle panoramic depth map, and realizes 3-D point cloud from the inexpensive LIDAR projected on monochromatic or stereoscopic panoramic imageDeep grid.Finally, high-resolution 3-D point cloud can be generated from wide-angle panoramic depth map.Identical process is suitable for havingConfiguration compared with narrow viewing angle (for example, angle of limited range) stereo camera and compared with narrow viewing angle low resolution LIDAR.

According on one side, which receives the first image by the first cameras capture, first picture catching ADV'sA part of driving environment.The system receives the second image, which indicates to be generated by laser radar (LIDAR) device, with the first depth map of a part of corresponding first cloud of the driving environment.System factor pair at a predetermined ratioTwo image down samplings, until the resolution ratio of the second image reaches scheduled threshold value.System passes through to the first image and down-samplingSecond image application convolutional neural networks (CNN) model generates the second depth map, which has than the first depth mapHigher resolution ratio, so that the second depth map indicates the second point cloud of the driving environment around perception ADV.

The system receives the first image by the first cameras capture according to another aspect, first picture catching ADV'sA part of driving environment.The system receives the second image, which indicates to be generated by laser radar (LIDAR) device, with the first depth map of a part of corresponding first cloud of the driving environment.System factor pair at a predetermined ratioTwo picture up-samplings, to match the image scaled of the first image.System is answered by the second image to the first image and up-samplingThe second depth map is generated with convolutional neural networks (CNN) model, which has higher resolution than the first depth mapRate, so that the second depth map indicates the second point cloud for perceiving the driving environment around ADV.

The system receives the first image by the first cameras capture according to another aspect, first picture catching ADV'sA part of driving environment.The system receives the second image, which indicates to be generated by laser radar (LIDAR) device, with the first depth map of a part of corresponding first cloud of the driving environment.The system passes through to the first image applicationConvolutional neural networks (CNN) models determines the second depth map.System passes through to the first image, the second image and the second depth mapApplication conditions random field function generates third depth map, which has higher resolution ratio than the first depth map,So that third depth map indicates the second point cloud of the driving environment around perception ADV.

Fig. 1 is the block diagram for showing automatic driving vehicle network configuration in accordance with one embodiment of the present disclosure.With reference to figure1, network configuration 100 includes that driving automatically for one or more servers 103 to 104 can be communicably coupled to by network 102Sail vehicle 101.Although showing an automatic driving vehicle, multiple automatic driving vehicles can be connected to that by network 102This and/or be connected to server 103 to 104.Network 102 can be any kind of network, for example, wired or wireless localNet (LAN), the wide area network (WAN) of such as internet, cellular network, satellite network or combinations thereof.Server 103 to 104 can be withAny kind of server or cluster of servers, such as, network or Cloud Server, application server, back-end server or itsCombination.Server 103 to 104 can be data analytics server, content server, traffic information server, map and interestPoint (MPOI) server or location server etc..

Automatic driving vehicle refers to the vehicle that may be configured under automatic driving mode, in the automatic Pilot mouldVehicle navigates in the case where few or input not from driver under formula passes through environment.This automatic driving vehicle can be withIncluding sensing system, the sensing system have one for being configured to detect related with vehicle running environment information orMultiple sensors.The vehicle and its associated controller are navigated using information detected through the environment.AutomaticallyDriving vehicle 101 can run in a manual mode, under full-automatic driving mode or under the automatic driving mode of part.

In one embodiment, automatic driving vehicle 101 includes, but are not limited to perception and planning system 110, vehicleControl system 111, wireless communication system 112, user interface system 113 and sensing system 115.Automatic driving vehicle 101 is alsoIt may include the certain common components for including in common vehicle: engine, wheel, steering wheel, speed changer etc., the portionPart can be controlled with planning system 110 using a variety of signals of communication and/or order by vehicle control system 111 and/or perceptionSystem, a variety of signals of communication and/or order for example, acceleration signals or order, reduce-speed sign or order, turn signal or order,Brake signal or order etc..

Component 110 to 115 can be communicably coupled to each other via interconnection piece, bus, network or combinations thereof.For example, portionPart 110 to 115 can be connected to each other via controller LAN (CAN) bus communication.CAN bus is configured to allowMicrocontroller and device in the application of not host with the vehicle bus standard that communicates with one another.It is to be initially in automobileIt is multiplexed the message based agreement of electric wiring design, but is also used for many other environment.

Referring now to Figure 2, in one embodiment, sensing system 115 includes but is not limited to one or more camera shootingsMachine 211, global positioning system (GPS) unit 212, Inertial Measurement Unit (IMU) 213, radar cell 214 and optical detection and surveyAway from (LIDAR) unit 215.GPS system 212 may include transceiver, and the transceiver can be operated to provide about automatic PilotThe information of the position of vehicle.IMU unit 213 can sense the position of automatic driving vehicle based on inertial acceleration and orientation becomesChange.What radar cell 214 can indicate to sense the object in the home environment of automatic driving vehicle using radio signal isSystem.In some embodiments, in addition to sensing object, in addition radar cell 214 can sense the speed and/or advance of objectDirection.Laser can be used to sense the object in automatic driving vehicle local environment in LIDAR unit 215.Except other Account DeptsExcept part, LIDAR unit 215 can also include one or more laser sources, laser scanner and one or more detectors.Video camera 211 may include one or more devices for acquiring the image of automatic driving vehicle ambient enviroment.Video camera 211It can be still camera and/or video camera.Video camera, which can be, mechanically to be moved, for example, by by video cameraIt is mounted on rotation and/or sloping platform.

Sensing system 115 can also include other sensors, such as: sonar sensor, turns to sensing at infrared sensorDevice, throttle sensor, braking sensor and audio sensor (for example, microphone).Audio sensor may be configured to fromSound is acquired in environment around automatic driving vehicle.Rotation direction sensor may be configured to the wheel of sensing steering wheel, vehicleOr combinations thereof steering angle.Throttle sensor and braking sensor sense the throttle position and application position of vehicle respectively.?Under some situations, throttle sensor and braking sensor be can integrate as integrated form throttle/braking sensor.

In one embodiment, vehicle control system 111 includes but is not limited to steering unit 201, throttle unit 202(also referred to as accelerator module) and brake unit 203.Steering unit 201 is used to adjust direction or the direction of advance of vehicle.ThrottleUnit 202 is used to control the speed of motor or engine, the speed of motor or engine so that control vehicle speed andAcceleration.Brake unit 203 makes the wheel of vehicle or tire deceleration make vehicle deceleration by providing friction.It should be noted that such as Fig. 2Shown in component can be implemented with hardware, software, or its combination.

Fig. 1 is returned to, wireless communication system 112 allows automatic driving vehicle 101 and such as device, sensor, other vehiclesEtc. communication between external systems.For example, wireless communication system 112 can with one or more device direct wireless communications, orPerson carries out wireless communication via communication network, such as, is communicated by network 102 with server 103 to 104.Wireless communication system112 can be used any cellular communications networks or WLAN (WLAN), for example, using WiFi, with another component or beSystem communication.Wireless communication system 112 can such as using infrared link, bluetooth with device (for example, the mobile device of passenger,Loudspeaker in display device, vehicle 101) direct communication.User interface system 113 can be implement in vehicle 101 it is outerThe part of device is enclosed, including such as keyboard, touch panel display device, microphone and loudspeaker.

Some or all of function of automatic driving vehicle 101 can be controlled or be managed with planning system 110 by perception,Especially when being operated under automatic driving mode.Perception includes necessary hardware (for example, processor, storage with planning system 110Device, storage equipment) and software (for example, operating system, planning and Vehicle routing program), to be from sensing system 115, controlSystem 111, wireless communication system 112 and/or user interface system 113 receive information, and the received information of processing institute is planned from startingRoute or path of the point to destination, then drive vehicle 101 based on planning and controlling information.Alternatively, perception and ruleDrawing system 110 can integrate with vehicle control system 111.

For example, the initial position and destination of stroke can be for example specified via user interface as the user of passenger.Perception obtains stroke related data with planning system 110.For example, perception can be obtained with planning system 110 from MPOI serverPosition and route information are obtained, the MPOI server can be a part of server 103 to 104.Location server provides positionThe service of setting, and MPOI server provides the POI of Map Services and certain positions.Alternatively, such position and MPOI information canWith local cache in the permanent storage device of perception and planning system 110.

When automatic driving vehicle 101 is moved along route, perception can also be from traffic information system with planning system 110Or server (TIS) obtains Real-time Traffic Information.It should be noted that server 103 to 104 can be operated by third party entity.Alternatively, the function of server 103 to 104 can be integrated with perception with planning system 110.Believed based on real-time trafficBreath, MPOI information and location information and the real-time home environment data for being detected or being sensed by sensing system 115 are (for example, barrierHinder object, object, neighbouring vehicle), perception with planning system 110 can plan best route and according to the route planned for exampleVehicle 101 is driven, via control system 111 with the specified destination of safety and efficiently arrival.

Server 103 can be data analysis system, to execute data analysis service for various clients.Implement at oneIn mode, data analysis system 103 includes data collector 121, machine learning engine 122, neural network model generator 123With neural network/CRF model 124.Data collector 121 can be collected from equipped with the various vehicles of LIDAR sensor/video cameraDifferent training data, wherein LIDAR sensor/video camera is communicatively coupled to server 103, and various vehicles are to drive automaticallyThe common vehicle sailing vehicle or being driven by human driver.The example of training data can be the depth for image recognition functionDegree/image data, such as Object Segmentation, detection, tracking and classification.Training data can be compiled into classification and with ground true tagIt is associated.In another embodiment, data collector 121 can download training dataset from the online archive of WWW.

Based on the training data collected by data collector, machine learning engine 122 can be generated for various purposes or be instructedPractice one group of neural network/CRF model 124.For example, machine learning engine 122 can be used training data to as neural network/The CNN model of a part of CRF model 124 executes end-to-end training, wherein all low point of for example RGB image/3-D of training dataResolution point cloud and 3-D high-resolution point cloud input/output pair.

CNN is a kind of feed forward-fuzzy control (ANN), in the feed forward-fuzzy control (ANN), its neuronBetween connection mode be inspiration by the tissue of animal vision cortex.Single cortical neuron is to referred to as " acceptance region "Stimulation in limited space areas responds.The reception domain of different neurons is partly overlapped, so that their visual field is tiled.The response that single neuron receives the stimulation in domain to it can carry out mathematical approach by convolution algorithm.Depth CNN is with moreThe CNN of a internal layer." internal layer " of neural network refers to the layer between the input layer and output layer of neural network.

ANN is the calculation method based on a large amount of neural units or neuron, to a large amount of neurons connected by aixs cylinderBiological brain loosely model.Each neuron is connected with many other neurons, and aixs cylinder or connection can pass through studyOr the effect trained enhancing or inhibit them to the state of activation of the neuron connected.Each individually neuron can haveThe function that the value of its all input is combined.May there are threshold function table or limitation in each connection and with unit sheetFunction: so that signal must be over limitation before traveling to other neurons.These systems carry out self study and training, withoutIt is clearly sequencing.

" training " CNN is related to inputting the input layer of CNN with being iterated, and by desired output and the output layer of CNNThe reality output at place is compared, to calculate error term.These error terms are used to adjust weight in the hidden layer of CNN and partiallyDifference, so that output valve will be closer to " correct " value next time.The distribution of each layer of input can slow down training (that is, convergence needsLower trained rate), and need careful parameter initialization, that is, the initial weight of the activation of internal layer and deviation are arrangedFor specific range, to be restrained." convergence " refers to when error term reaches minimum value.

Once CNN model training, which can be uploaded in such as ADV of ADV 101, to generate real-time high scoreResolution 3-D point cloud.High-resolution 3-D point cloud can by optical imagery from cameras capture and by inexpensive RADAR and/orLIDAR unit capture low resolution 3-D point cloud reasoning depth map and generate in real time.It should be noted that neural network/CRF model124 are not limited to convolutional neural networks and conditional random fields (CRF) model, but may include Radial Basis Function Neural, passReturn neural network model, Kohonen self organizing network model etc..Neural network/CRF model 124 may include different depth CNNModel, such as LeNet^TM、AlexNet^TM、ZFNet^TM、GoogLeNet^TM、VGGNet^TMOr combinations thereof.In addition, can at active coatingNormalization layer is introduced, to reduce the training time and increase rate of convergence.(dropout) is exited in addition, can introduce at random nodeLayer, to remove contribution of the node to active coating, to prevent the overfitting of training data.

Fig. 3 is to show showing for the perception being used together with automatic driving vehicle and planning system according to one embodimentThe block diagram of example.System 300 may be implemented as a part of the automatic driving vehicle 101 of Fig. 1, including but not limited to perception and ruleDraw system 110, control system 111 and sensing system 115.With reference to Fig. 3, perception includes but is not limited to fixed with planning system 110Position module 301, sensing module 302, prediction module 303, decision-making module 304, planning module 305, control module 306 and high scoreResolution point cloud module 307.

Some or all of module 301 to 307 can be implemented with software, hardware or combinations thereof.For example, these modules canTo be mounted in permanent storage device 352, be loaded into memory 351, and by one or more processors (not shown)It executes.It should be noted that some or all of these modules can be communicably coupled to some of the vehicle control system 111 of Fig. 2Or whole modules or it is integrated together.Some in module 301 to 307 can be integrated into integration module together.

Locating module 301 determines the current location of automatic driving vehicle 300 (for example, utilizing GPS unit 212).Positioning mouldBlock 301 (also referred to as map to route module) management any data relevant with the stroke of user or route.User can exampleThe initial position and destination of stroke are such as logged in and specified via user interface.Locating module 301 and automatic driving vehicle 300Such as map and route information 311 other component communications, to obtain stroke related data.For example, locating module 301 can be withPosition and route information are obtained from location server and map and POI (MPOI) server.Location server provides location-based service,And MPOI server provides the POI of Map Services and certain positions, so as to one as map and route information 311Divide cache.When automatic driving vehicle 300 is moved along route, locating module 301 can also from traffic information system orServer obtains Real-time Traffic Information.

Based on the sensing data provided by sensing system 115 and the location information obtained by locating module 301, perceptionModule 302 determines the perception to ambient enviroment.Perception information can indicate everyday driver in the vehicle week of driver skipperEnclose the thing that will be perceived.Perception may include the lane configurations for example, by using object form (for example, rectilinear stretch or bending vehicleRoad), traffic light signals, the relative position of another vehicle, pedestrian, building, crossing or other traffic correlating markings (examplesSuch as, stopping mark, yield signs) etc..

Sensing module 302 may include the function of computer vision system or computer vision system, to handle and analyzeThe image acquired by one or more video cameras, to identify the object and/or feature in automatic driving vehicle environment.It is described rightAs may include traffic signals, road boundary, other vehicles, pedestrian and/or barrier etc..Computer vision system can be usedObject recognition algorithm, video tracking and other computer vision techniques.In some embodiments, computer vision system canTo draw environmental map, tracking object, and the speed etc. of estimation object.Sensing module 302 can also be based on by such as radarAnd/or other sensing datas that other sensors of LIDAR provide carry out test object.

For each object, prediction module 303 predicts how object will show in this case.Prediction is based on perceptionWhat data executed, which perceives at the time point for considering chart portfolio/route information 311 and traffic rules 312 and drivesEnvironment.For example, if object is the vehicle in opposite direction and current driving environment includes crossroad, prediction module 303Whether prediction vehicle may be moved forward or be turned straight.If perception data shows that crossroad does not have traffic lights,Prediction module 303 may predict that vehicle may need parking completely before entering the intersection.If perception data showsVehicle is currently in unique lane or unique lane of turning right, then prediction module 303 may predict that vehicle will be more likely to distinguishTurn left or turns right.

For each object, decision-making module 304 makes disposing the decision of object.For example, being directed to special objectThe metadata (for example, speed, direction, angle of turn) of (for example, another vehicle in intersecting routes) and description object, decisionModule 304 determines how with the object to meet (for example, overtake other vehicles, give way, stop, be more than).Decision-making module 304 can be according to allSuch as traffic rules drive the rule set of rule 312 to make such decision, and the rule set can store in permanent storageIn device 352.

Based on for decision in each of perceived object, planning module 305 is that automatic driving vehicle plans roadDiameter or route and drive parameter (for example, distance, speed and/or angle of turn).In other words, for given object, decisionWhat the decision of module 304 does to the object, and planning module 305 is determined how and done.For example, for given object, decisionModule 304 can determine to be more than the object, and planning module 305 can determine and surpass in the left side of the object or right sideIt crosses.Planning and control data are generated by planning module 305, including description vehicle 300 recycles (under for example, all the way in next movementLine/route segment) in the information that how will move.For example, planning and control data can indicate vehicle 300 with 30 mph.(mph) speed is 10 meters mobile, then changes to right-hand lane with the speed of 25mph.

Based on planning and control data, control module 306 are led to according to by planning and controlling route that data limit or pathIt crosses and sends vehicle control system 111 for order appropriate or signal to control and drive automatic driving vehicle.It is described planning andControl data bag includes enough information, is arranged or is driven using vehicle appropriate to put in different times along path or routeParameter (for example, throttle, braking and turning command) is driven to second point for first point of vehicle from route or path.

In one embodiment, the planning stage executes in multiple planning horizons (being also referred to as used as the instruction cycle), for example,It is executed in the period that each time interval is 100 milliseconds (ms).For each of planning horizon or instruction cycle, by baseOne or more control instructions are issued in planning and controlling data.That is, planning module 305 plans next road for every 100msLine segment or route segment, it may for example comprise target position and ADV reach the time required for target position.Alternatively, planning module305 may also dictate that specific speed, direction and/or steering angle etc..In one embodiment, planning module 305 is nextPredetermined period (such as, 5 seconds) programme path section or route segment.For each planning horizon, planning module 305 was based in the last weekThe target position planning of interim planning is used for the target position of current period (for example, next 5 seconds).Control module 306 is thenPlanning based on current period and control data generate one or more control instructions (for example, throttle, braking, course changing control refer toIt enables).

It should be noted that decision-making module 304 and planning module 305 can integrate as integration module.Decision-making module 304/ plans mouldBlock 305 may include the function of navigation system or navigation system, to determine the driving path of automatic driving vehicle.For example, navigationSystem can determine a series of speed and direction of advance moved for realizing automatic driving vehicle along following path: the roadDiameter substantially avoids feeling while making automatic driving vehicle advance along the path based on driveway for leading to final destinationThe barrier known.Destination can be set according to the user's input carried out via user interface system 113.Navigation system canTo dynamically update driving path while automatic driving vehicle is currently running.Navigation system can will from GPS system andThe data of one or more maps merge, to determine the driving path for being used for automatic driving vehicle.

304/ planning module 305 of decision-making module can also include the function of anti-collision system or anti-collision system, to identificate and evaluateAnd avoid or cross in other ways the potential barrier in the environment of automatic driving vehicle.For example, anti-collision system can lead toCross the variation in the navigation of following manner realization automatic driving vehicle: one or more subsystems in operation control system 111To take deflecting manipulation, turn control, braking maneuver etc..Anti-collision system can travel pattern, condition of road surface etc. based on surroundingAutomatically determine feasible obstacle avoidance manipulation.Anti-collision system may be configured so that when other sensor system senses in placeDeflecting is not taken to manipulate whens automatic driving vehicle is by vehicle, architectural barriers object etc. in adjacent area that deflecting enters.AnticollisionSystem, which can automatically select, not only can be used but also the safety of automatic driving vehicle occupant is maximumlly manipulated.Anti-collision system canTo select prediction so that occurring the evacuation manipulation of the acceleration of minimum in the passenger compartment of automatic driving vehicle.

It image of the high-resolution point cloud module 307 based on cameras capture and is captured by radar and/or LIDAR unit lowResolution ratio 3-D point cloud generates high-resolution 3-D point cloud.High-resolution 3-D point cloud can be carried out by sensing module 302 using with senseKnow the driving environment of ADV.This image/3-D point cloud can be gathered by sensing system 115.Point cloud module 307 can catch video cameraThe image and low resolution LIDAR data application one or more CNN model caught are (as neural network/CRF model 313 onePart), to generate the LIDAR point cloud of higher resolution.It should be noted that point cloud module 307 and sensing module 302 can be integrated into it is integratedFormula module.

Fig. 4 is the high-resolution point cloud generator being used together with automatic driving vehicle shown according to one embodimentExemplary block diagram.High-resolution point cloud module 307 includes up-sampling and/or repair module 401, down sample module 402, panoramaModule 403, conditional random fields (CRF) module 404 and high-resolution depth graph module 405.Up-sampling and/or repair module 401Input picture can be up-sampled, that is, picture size is made to increase a factor.Repair module can be using repairing algorithm to restore or againThe lost part of composition picture or deterioration part, such as the dim spot introduced in depth map by dark object.Down sample module 402 can be to figureAs down-sampling, that is, picture size is made to reduce a factor.The image of relative narrow-angle can be converted into wider angle by panorama module 403(for example, 360 degree of views) panoramic picture of view is spent, or vice versa.For example, panorama module 403 can be by the weight of fluoroscopy imagesThe folded visual field is first mapped in circular cylindrical coordinate or spherical coordinate.Then, it the image blend of mapping and/or is stitched together.Herein,The image of splicing shows horizontal field of view and the limited vertical visual field for the wider degree of circular cylindrical coordinate, or sits to spherical shapeThe vertical visual field of target 180 degree.Panorama in the projection is intended to be considered as the seemingly image and is wound into cylindrical body/sphere progressIt watches and is watched from inside.When watching in 2D plane, horizontal line is in bending, and vertical line is still vertical.CRFModule 404 can output to CNN model and low resolution depth map application CRF (for example, Optimized model) model, with further thinChange the estimation of depth map.Finally, high-resolution depth graph module 405 applies CNN mould to RGB image/LIDAR depth image inputType, to generate high-resolution LIDAR depth image.

Some or all of module 401 to 405 can be implemented in software, hardware or combinations thereof.For example, these modulesIt is mountable to be loaded into memory 351 in permanent storage device 352, and by one or more processors (not shown)It executes.It should be noted that some or all of these modules can be communicatively coupled to one in the vehicle control system 111 of Fig. 2A little or whole modules, or integrated with some or all of the vehicle control system 111 of Fig. 2 module.Module 401 toSome in 405 integrate as integrated form module.For example, up-sampling module 401 and down sample module 402 can be with heightDepth of resolution module 405 integrates.

Fig. 5 A is the diagram for showing exemplary ADV according to one embodiment.It is included in top referring to Fig. 5 A, ADV 101The LIDAR/ panoramic camera configuration 501 of portion's installation.In another embodiment, LIDAR/ panoramic camera configuration 501 can be pacifiedIt is suitable for from anywhere in this sensor unit of placement on the bonnet or compartment of ADV 101 or on ADV.

Fig. 5 B and Fig. 5 C show the top view and side view configured according to the LIDAR/ panoramic camera of some embodiments.Referring to Fig. 5 B, in one embodiment, configuration 501 includes low definition or low resolution LIDAR unit 502 and stereoscopic full viewsVideo camera 504 (for example, multiple-camera).In one embodiment, LIDAR unit 502 can be placed in camera unit 504On top.The unit adjustable be with similar reference point, such as center vertical reference line (not shown), so as to LIDAR andPanoramic camera is rotated around reference line.Referring to Fig. 5 C, in one embodiment, configuration 501 includes that there is monochromatic panorama to take the photographThe low resolution LIDAR unit 502 of shadow machine 506.Similarly, LIDAR unit 502 can be placed in the top of camera unit 506On, and these unit adjustables are with similar reference point, such as center vertical reference line (not shown), so as to LIDARIt is rotated with panoramic camera around the reference line.It should be noted that low resolution or low definition LIDAR unit refers to and high-resolutionLIDAR unit, which compares, captures sparse 3-D point cloud or with the LIDAR unit for putting cloud less put.With intensive 3-D point cloudIt compares, sparse 3-D point cloud includes less depth data or information.As exemplary comparison, and have with per second 200The LIDAR unit of more port numbers (for example, 64 channels) of 10000 points of capture wider angle views compares, and has with per second 300,It can be low resolution unit that 000 point, which captures 16 channels of wider degree view or the LIDAR unit in less channel,.

Fig. 5 D shows the top view and side view of the configuration of monochromatic panoramic camera according to one embodiment.In a realityIt applies in mode, monochromatic panoramic camera configuration 506 includes six video cameras for being placed as hexagonal shape.The center of hexagon canTo be for determining camera focus, the visual field and the center reference point at visual angle.Each video camera video camera adjacent with its canTo be positioned to about 60 degree of interval in horizontal view angle, to obtain broader horizontal view angle (for example, 360 degree of views) completely.OneIn a embodiment, each of six video cameras can capture to visual angle be about 120 degree of horizontal angles image so that by a left sideThe adjacent cameras capture of side video camera and right side to image between have about 30 degree of overlapping.The overlapping can be used for captureImage blend and/or be stitched together to generate panoramic picture.

After cylinder or spherical panorama image (for example, panorama RGB image) with generation, 3-D Dian Yun projectable to (2-D) on the cylindrical or spherical plane of delineation, to be aligned with cylindrical or spherical panorama RGB image.For example, 3-D point projectable arrivesOn cylindrical (or distortion) plane of delineation of 2-D as follows.Enabling (u, v) is the position of the pixel on the plane of delineation of distortionIt sets.Then, the location of pixels on 2-D cylinder will be (r, h), wherein

Or

And f is focal length of camera.In identical 3-D point projectable to the 2-D spherical shape plane of delineation as follows.

Enabling (u, v) is the position of the pixel on the plane of delineation of distortion.Then, the location of pixels on 2-D spherical surface will be (r,H), wherein

Or

And f is focal length of camera.It, can be by the way that 2-D panorama depth map back projection be arrived in order to from depth map reconstruction point cloud3-D spatially executes reverse transformation.Triangulation can be executed based on the pixel of panoramic surface.It in one embodiment, can be fromThe position of pixel directly executes triangulation those of on camera image plane.In some embodiments, more camera shootingsMachine (such as, three to eight video cameras) can be used for panoramic camera configuration 506.Video camera can be respectively with triangle, rectangle, fiveSide shape or octagonal shape are arranged.

Fig. 5 E and Fig. 5 F show the example configured according to the stereoscopic full views video camera of some embodiments.Reference Fig. 5 E,In one embodiment, it includes being placed as six sides that stereoscopic full views video camera, which configures 514 (for example, camera configurations 504 of Fig. 5 B),12 video cameras of shape shape.The center of hexagonal shape can be the center reference point for determining camera angles, andAs for stereo camera to the baseline for establishing stereoscopic panoramic image.Each pairs of video camera of solid and its adjacent solidVideo camera (left video camera and right video camera) can be spaced apart in 60 degree.

Referring to Fig. 5 F, in one embodiment, the configuration 524 of stereoscopic full views video camera is (for example, the camera configuration of Fig. 5 B504) include two monochromatic panoramic camera configurations, each all have six video cameras for being placed as hexagonal shape.Herein, it standsBody panoramic camera configuration not instead of left and right is three-dimensional in pairs, and vertical top and bottom are three-dimensional in pairs.The stereoscopic full views of captureIn image projectable to cylinder or spherical surface as shown above.It is subsequently used as by the image of three-dimensional pairs of cameras capture as high scoreThe input (together with low resolution LIDAR image) of resolution depth map module, the high-resolution depth graph module 405 of such as Fig. 4,To generate high-resolution depth graph or LIDAR image.

The flow chart of reasoning pattern and training mode according to one embodiment is shown respectively in Fig. 6 A and Fig. 6 B.Fig. 6 C andThe flow chart of reasoning pattern and training mode according to one embodiment is shown respectively in Fig. 6 D.Fig. 6 A and Fig. 6 B are related to from camera shootingMachine picture construction monochrome or stereoscopic panoramic image (passing through image blend and/or splicing), then by panoramic picture and LIDARImage fusion, to generate high-resolution panorama depth/disparity map.Fig. 6 C and Fig. 6 D are related to camera review and LIDAR imageThen depth map is mixed and/or is stitched together to generate panorama depth to generate high-resolution depth/disparity map by fusionFigure.

Referring to Fig. 6 A, reasoning pattern according to one embodiment is described.Process 600 can be executed by processing logic,The processing logic may include software, hardware or combinations thereof.For example, process 600 can be by the point cloud module of automatic driving vehicleIt executes, the point cloud module 307 of such as Fig. 3.Referring to Fig. 6 A, at frame 601, the calibration of processing logic or configuration camera system (exampleSuch as, it determines the reference center for panorama configuration, determines and/or adjust the focal length of video camera).At frame 603, processing logic is rawAt the solid of wider angle (for example, 360 degree) or the panoramic picture of monochrome, cylinder or spherical shape.At frame 605, logic is handledLIDAR 3D point cloud is projected on panoramic picture, to generate deep grid or depth map.At frame 607, it is based on deep gridWith monochrome/stereoscopic panoramic image, logic is handled using coder-decoder network 611 (for example, the CNN/CNN+CRF mould of trainingType) reasoning is executed, to generate panorama depth map.At frame 609, processing logic by panorama depth map back projection return the space 3-D withGenerate high-resolution point cloud.

Referring to Fig. 6 B, training mode according to one embodiment is described by process 620.Process 620 can be byIt manages logic to execute, the processing logic may include software, hardware or combinations thereof.For example, process 620 can be by machine learning engineIt executes, the machine learning engine 122 of the server 103 of such as Fig. 1.For the training mode, such as high-resolution is had collectedThe training data of LIDAR point cloud and monochromatic/three-dimensional RGB image.At frame 621, according to datagram image source, handle logic calibration orAt least determine the focal length of camera for being used for camera review.At frame 623, processing logic is based on monochrome/stereo-picture and generates entirelyScape image.At frame 625, LIDAR point cloud is projected on the plane of delineation and/or is by LIDAR picture up-sampling by processing logicRGB image ratio.For monochromatic panorama, coder-decoder network 627 learns from low resolution depth panorama reasoning high-resolutionRate depth panorama.For stereoscopic full views, the study of coder-decoder network 627 improves to match with low resolution depth panoramaStereoscopic full views, wherein low resolution depth panorama from low resolution LIDAR 3-D point cloud projection get.

Referring to Fig. 6 C, reasoning pattern according to one embodiment is described.Process 640 can be executed by processing logic,The processing logic may include software, hardware or combinations thereof.For example, process 640 can be by the point cloud module of automatic driving vehicleIt executes, the point cloud module 307 of such as Fig. 3.Referring to Fig. 6 C, at frame 641, the calibration of processing logic or configuration camera system (exampleSuch as, it determines the reference center for panorama configuration, determines and/or adjust the focal length of video camera).At frame 643, processing logic is pre-Camera view is handled, camera view is such as twisted into three-dimensional view or non-panoramic cylindrical view/spherical view.In frameAt 645, processing logic projects to low resolution LIDAR 3D point cloud on camera review, to generate low resolution deep gridOr depth map.At frame 647, it is based on deep grid and monochrome/stereoscopic panoramic image, processing logic uses 649 (example of encoderSuch as, trained CNN/CNN+CRF model) reasoning is executed, to generate high-resolution depth graph.At frame 653, processing logic is based onThe panorama depth map of calibration information 651 (such as, calibration information 641) generation wider angle.At frame 655, processing logic will be completeThe space 3-D is gone back to generate high-resolution point cloud by scape depth map back projection.

Referring to Fig. 6 D, training mode according to one embodiment is described by process 660.Process 660 can be byIt manages logic to execute, the processing logic may include software, hardware or combinations thereof.For example, process 660 can be by machine learning engineIt executes, the machine learning engine 122 of the server 103 of such as Fig. 1.For the training mode, have collected one group it is well-knownTraining data, such as high-resolution LIDAR point cloud and monochromatic/three-dimensional RGB image.At frame 661, according to datagram image source, placeIt manages logic calibration or at least determines the focal length of camera for being used for camera review.At frame 663, processing logic is based on monochromatic/verticalBody image prepares the camera review for training.At frame 665, LIDAR point cloud is projected to the figure of RGB image by processing logicAs being RGB image ratio in plane and/or by LIDAR picture up-sampling.For monochrome camera image, coder-decoderNetwork 667 learns from low resolution depth panorama reasoning high-resolution depth panorama.For stereo camera image, encoder-The study of decoder network 667 improves the stereoscopic full views to match with low resolution depth panorama, wherein low resolution depth panoramaIt is got from the projection of low resolution LIDAR 3-D point cloud.

The output of encoder/decoder network 627 (for example, CNN model) is compared with expected results, to determine codingWhether the difference between the output and expected results of device/decoder network 627 is lower than predetermined threshold.If difference exceeds predetermined thresholdValue, then can be iteratively performed the above process by the certain parameters or coefficient for modifying the model.Duplicate process can be performed, untilThe difference falls below predetermined threshold, at this time, it is believed that the final products of model are completed.Then, based on low resolution point cloud andBy the image of one or more cameras captures, which is used in ADV in real time to generate high-resolution point cloud.

Fig. 7 A and Fig. 7 B are to show the exemplary block diagram generated according to the depth map of some embodiments.Reference Fig. 7 A,In one embodiment, depth map generator 700 may include down sample module 402 and CNN model 701.CNN model 701 (is madeFor neural network/CRF model 313 a part) may include shrinkage layer (or encoder or convolutional layer) 713 and expansion layer (orDecoder or warp lamination) 715.Fig. 7 B shows the depth map generator 720 of another exemplary embodiment.Depth map generator700 and 720 can be executed by the depth map module 405 of Fig. 4.

Referring to Fig. 4 and Fig. 7 B, generator 720 is received by the first image of the first cameras capture (for example, cameras captureImage 703), a part of the driving environment of first picture catching ADV.First image can be to be captured by camera systemRGB image.Generator 720 receives the second image of such as low resolution LIDAR image 707, which indicates by swashingOptical radar (LIDAR) device generate, with the first depth map of a part of corresponding first cloud of the driving environment.Under adoptThe second image of factor pair (for example, image 707) down-sampling at a predetermined ratio of egf block 402, until the resolution ratio of the second imageReach predetermined threshold.In one embodiment, to the second image down sampling, until it is intensive, that is, until the second imageAny of two adjacent cloud points in overlapping or the amount in " gap " fall below predetermined threshold.Generator 720 by pairSecond image application CNN model 701 of the first image (for example, image 703) and down-sampling come generate the second depth map (for example,High-resolution depth graph 709), which has more than the first depth map (for example, image 707)High resolution ratio, so that the second depth map (for example, image 709) indicates the second point cloud of the driving environment around perception ADV.It answersNote that term " image " is often referred to RGB image or LIDAR image.Term " depth map " or " LIDAR image ", which refer to, to be mapped toThe 2-D image of 3-D point cloud in visible image plane or panoramic picture plane." image of cameras capture " refers to be imaged by pin holeThe optical imagery that machine device captures.

In one embodiment, the image 703 of cameras capture and LIDAR image 707 are to distort or project to cylinderNon-panoramic image in shape or the spherical plane of delineation.In another embodiment, the image 703 of cameras capture and LIDAR figureAs 707 be panoramic picture, such as cylindrical or spherical panorama image.In another embodiment, the image 703 of cameras captureIt is fluoroscopy images with LIDAR image 707.Herein, for the camera configuration, fluoroscopy images can be from from monochrome/stereoscopic full viewsThe single camera collection or any single camera of video camera configuration generate.Monochromatic panoramic camera is configured, which canMultiple video cameras including about capturing multiple images at the same time, the configuration 506 of such as Fig. 5 C.Image will pass through panorama module(the panorama module 403 of such as Fig. 4) distorts and mixes and/or be stitched together, to generate cylindrical or spherical panorama image.

LIDAR is configured, LIDAR image 707 by the 3-D point cloud that will be captured by LIDAR detector from the space 3-D/Planar Mapping is generated to the 2-D plane of delineation.Herein, the 2-D plane of delineation of image 707 can be figure identical with image 703As plane.In another embodiment, LIDAR image 707 can be corresponding with the fluoroscopy images 703 of cameras captureDepending on LIDAR image.Herein, can pairs of image 703 to several perspectives and image 707 continuously apply CNN model 701, with lifeAt perspective LIDAR image.Then, the perspective LIDAR image of generation can be spelled by panorama module (such as, the panorama module 403 of Fig. 4)It connects or mixes, to generate panorama LIDAR image.In another embodiment, generator 720 may include multiple CNN mouldsType, and these models can be simultaneously applied to the pairs of image 703 and image 707 of multiple perspectives, to generate multiple perspectivesLIDAR image, to carry out panoramic picture generation.

Referring to Fig. 4 and Fig. 7 A, in another embodiment, the reception third image of generator 700, such as cameras captureImage 705, the image 705 of the cameras capture is by the second cameras capture.The high-resolution depth graph module of generator 700(such as, the high-resolution depth graph module 405 of Fig. 4) is answered by the second image to the first image, third image and up-samplingThe second depth map is generated with CNN model.Herein, image 703 and image 705 can be left and right stereo-picture (for example, by Fig. 5 EConfiguration 514 capture image) or vertical top and bottom stereo-picture (for example, the figure captured by the configuration 524 of Fig. 5 FPicture).Although illustrating only two images of cameras capture, can will be also used as by more images of more cameras capturesFor the input of CNN model.

Fig. 8 is contraction (for example, encoder/convolution) layer and expansion (example for showing CNN model according to one embodimentSuch as, decoder/deconvolution) layer diagram.CNN model 800 receives camera review 801, low resolution depth image 803, andHigh-resolution depth graph is exported as 825.For purposes of illustration, it is used here single RGB image 801.However, it is also possible to answerWith the multiple images from multiple cameras captures, for example, in stereoscopic configurations.It should be noted that in this application, RGB image refers toColor image.Reference Fig. 8, camera review 801 and low resolution depth image 803 can respectively indicate 703 He of image of Fig. 7 BImage 707.High-definition picture 825 can indicate the image 709 of Fig. 7 B.CNN model 800 may include different layer, all to adopt as followsSample layer 805, convolutional layer (807,809), warp lamination (811,817), prediction interval (813,819,823) and concatenation layer (815,821)。

Convolutional layer (a part of the shrinkage layer 713 as Fig. 7) and warp lamination (one of the expansion layer 715 as Fig. 7Point) can be attached in single pipeline.Each of convolutional layer or shrinkage layer can to previous input layer down-sampling, andEach of expansion layer or warp lamination can up-sample previous input layer.The last layer shrinkage layer 713 (for example, layer 809)It is connected to first layer expansion layer 715 (for example, layer 811), to form single pipeline.Prediction interval (813,819,823) executes single-passThe prediction of road depth map, and give the predictive feed forward to next layer.

Prediction interval facilitates the evaluated error for exporting final CNN by reducing the error propagated in the training processIt minimizes.Prediction interval can be implemented as the convolutional layer with feature below: output image has identical as input pictureOne output channel of picture size.However, prediction interval may include up-sampling function, to be up-sampled to output picture size, withJust next layer of picture size is matched.It concatenates layer (808,815,821) and executes composite function, composite function group unification or moreA image, such as, the output image of warp lamination, convolutional layer and/or prediction interval.Convolutional layer/warp lamination can be such that CNN passes throughLow-level features (such as, edge and bending) is found to execute image classification, to construct higher levels of feature.Down-sampling isRefer to the height of image and/or width divided by a factor, such as factor 2 (that is, picture size reduces four times).Up-sampling isRefer to the height of image and/or width multiplied by a factor, such as factor 2 (that is, picture size increases four times).

Referring to Fig. 8, for purposes of illustration, in one embodiment, image 801 may include monochrome in stereoscopic configurationsRGB video camera image (for example, 3 channels of combination, 192 pixels × 96 pixels image) or multiple RGB images.Low resolution is deepDegree image 803 may include 48 pixel of single channel (that is, gray scale) × 24 pixels LIDAR image (that is, image 803 is that image 801 comparesThe a quarter of example).Convolutional layer 807 can receive image 801, and with the factor 2 to 801 down-sampling of image, thus 64 channels of output,96 pixels × 48 pixels image.Subsequent convolutional layer can carry out down-sampling to the image from corresponding input with a factor,Such as factor 2.

The LIDAR image 803 of input can carry out down-sampling by down-sampling 805, until it is intensive.For example, ifVery close to each other or with less gap in pixel and output, then image is exactly intensive, for example, 512 channels, 24 pixels ×The image of 12 pixels.Concatenating layer 808 can be to the corresponding output (for example, 512 channels, 24 pixels × 12 pixels image) of convolutional layerOutput (for example, 512 channels, 24 pixels × 12 pixels image) with down-sampling layer 805 executes combination, to generate with higherThe combination image (for example, 1024 channels, 24 pixels × 12 pixels image) of resolution ratio.It should be noted that in order to make taking the photograph for down-samplingThe depth image or depth map of camera image and down-sampling are combined, and the size or dimension of two images must match.RootAccording to the size or dimension of the depth image layer for having carried out down-sampling, the corresponding convolutional layer to be matched using the size with depth imageIt combines two images.Convolutional layer 809 can for example have 1024 channels, 24 pixels × 12 pixels image as input, and2048 channels, 12 pixels × 6 pixels image are as output.

Warp lamination 811 can have 2048 channels, 12 pixels × 6 pixels image as input and 1024 channels, 24Pixel × 12 pixels image is as output.Prediction interval 813 can be up-sampled with 2 pairs of inputs of the factor, and can be had2048 channels, 12 pixels × 6 pixels image are as input and 1 channel, 24 pixels × 12 pixels image as output.Concatenation layer 815 can have there are three input, these three inputs have the picture size to match, such as, from the defeated of convolutional layer 809Enter (for example, 1024 channels, 24 pixels × 12 pixels image), from prediction 813 output (for example, 1 channel, 24 pixels ×The image of 12 pixels) and output (for example, 1024 channels, 24 pixels × 12 pixels image) from warp lamination 811.CauseThis, concatenates exportable 2049 channel of layer 815,24 pixels × 12 pixels image.

Warp lamination 817 can have 1024 channels, 48 pixels × 24 pixels image as input and 512 channels, 96Pixel × 48 pixels image is as output.Prediction interval 819 can up-sample previous input with the factor 2, and canWith 1024 channels, 48 pixels × 24 pixels image as input and 1 channel, the image conduct of 96 pixels × 48 pixelsOutput.Concatenation 821 inputs there are three can having: the feedforward from convolutional layer is (for example, 64 channels, 96 pixels × 48 pixels figurePicture), the output (for example, 1 channel, 96 pixels × 48 pixels image) from prediction interval 819 and from warp lamination 817It exports in (for example, 512 channels, 96 pixels × 48 pixels image).Then, these inputs are combined by concatenation 821, and are exported577 channels, 96 pixels × 48 pixels image.Prediction interval 823 can be up-sampled with 2 pairs of inputs of the factor, and can be had577 channels, 96 pixels × 48 pixels image are made as input, and 1 channel of output, 96 pixels × 48 pixels depth imageFor output 825.It should be noted that in some embodiments, convolutional layer can be configured to the feedforward at random layer.In some embodiment partyIn formula, between convolutional layer inserted between pond (pooling) layer and warp lamination inserted with upper storage reservoir (unpooling)Layer.It should be noted that Fig. 8 shows a CNN model embodiment, but should not be construed as limiting.For example, in some embodimentsIn, CNN model may include different activation primitive (for example, ReLU, contrary flexure, step, tanh etc.), exit layer and normalizationLayer etc..

Fig. 9 A and Fig. 9 B are to show the exemplary block diagram generated according to the high-resolution depth graph of some embodiments.Fig. 9 APanorama converter 903 and map generator 905 can respectively indicate Fig. 6 A coder-decoder network 611 and panorama generate603.The panorama converter 903 and map generator 905 of Fig. 9 B can jointly respectively indicate the coder-decoder network of Fig. 6 C649 and panorama generate 653.High-resolution depth graph generator 905 can be executed by high-resolution depth graph module 405, Yi JiquanScape generator 903 can be executed by the panorama module 403 of Fig. 4.Referring to Fig. 9 A, the input of high-resolution depth graph generator 905 joinsIt is connected to the output of panorama converter 903.Herein, 901 are inputted, the image 703 and 705 and figure of such as cameras capture of Fig. 7 AThe low resolution LIDAR image 707 of 7A can be converted to panoramic picture by panorama converter 903.Generator 905 receives panorama sketchPicture simultaneously generates output 905, such as generates high-resolution depth graph, the LIDAR image 709 of such as Fig. 7 A.In the configuration, it inputsImage is combined by mixing, to generate panoramic picture before being fed to CNN model, to generate high-resolutionDepth map.

Referring to Fig. 9 B, in one embodiment, the output of high-resolution depth graph generator 905 is attached to panorama conversionThe input of device 903.Herein, input 901, the image 703 and 705 of such as cameras capture of Fig. 7 A and the low resolution of Fig. 7 ALIDAR image 707, can by generator 905 by CNN model (a part as high-resolution depth graph generator 905) intoRow application.The depth map of output is received by panorama converter 903.The output of generator 905 is converted to panorama depth by converter 903Degree figure, for example, output 907.In this example, it is fed in CNN model by the original image of cameras capture, to generate respectivelyIndividual high-resolution depth graph.Then, each depth map is combined into high-resolution panorama depth map by mixing.

Figure 10 is the flow chart for showing method according to one embodiment.Process 1000 can be executed by processing logic,The processing logic may include software, hardware or combinations thereof.For example, process 1000 can be by the point cloud module of automatic driving vehicleIt executes, the point cloud module 307 of such as Fig. 3.Referring to Fig.1 0, at frame 1001, processing logic is received by the first cameras captureFirst image, a part of the driving environment of first picture catching ADV.At frame 1002, processing logic receives the second image,Second image indicate it is being generated by LIDAR device, with the first of a part of corresponding first cloud of the driving environment the depthDegree figure.At frame 1003, logic the second image down sampling of factor pair at a predetermined ratio is handled, until the resolution ratio of the second imageReach scheduled threshold value.At frame 1004, processing logic passes through the second image application convolution mind to the first image and down-samplingThe second depth map is generated through network (CNN) model, the second depth map has higher resolution ratio than the first depth map, so that secondDepth map indicates the second point cloud of the driving environment around perception ADV.

In one embodiment, processing logic receives the third image by the second cameras capture, and by firstSecond image application CNN model of image, third image and down-sampling generates the second depth map.In one embodiment,First image includes cylindrical panoramic picture or spherical panorama image.In another embodiment, cylindrical panoramic picture or ballShape panoramic picture is generated based on several images captured by several camera systems.In another embodiment, processing is patrolledVolume second is reconstructed based in the space 3-D of cylindrical panoramic picture or spherical panorama image by projecting to the second depth mapPoint cloud.

In one embodiment, the second image of down-sampling is mapped to the plane of delineation of the first image by processing logicOn.In one embodiment, the second depth map is generated and the depth map for generating one or more is mixed, and is madeObtaining the second depth map is panorama sketch.

In one embodiment, CNN model includes shrinkage layer and expansion layer, wherein each shrinkage layer includes codingFor device to be attached to shrinkage layer to corresponding input progress down-sampling and expansion layer, each expansion layer includes decoder with rightCorresponding input is up-sampled.In one embodiment, the feedforward of information of shrinkage layer is to expansion layer, for example, shrinkage layerOutput is fed forward to the input of the expansion layer with the picture size or dimension to match.In one embodiment, in expansion layerEach of include prediction interval, with predict be used for succeeding layer depth map.

Figure 11 A and Figure 11 B are to show the exemplary block diagram generated according to the depth map of some embodiments.1A referring to Fig.1,In one embodiment, depth map generator 1100 may include up-sampling/repair module 401 and CNN model 701.CNN mouldType 701 (as neural network/CRF model 313 a part) may include shrinkage layer (or encoder or convolutional layer) 713 and expandOpen layer (or decoder or warp lamination) 715.Figure 11 B shows the depth map generator 1120 of another exemplary embodiment.It is deepDegree diagram generator 1100 and 1120 can be executed by the depth map module 405 of Fig. 4.

Referring to Fig. 4 and Figure 11 B, generator 1120 is received by the first image of the first cameras capture (for example, video camera is caughtThe image 703 caught), a part of the driving environment of first picture catching ADV.Generator 1120 receives such as low resolutionSecond image of LIDAR image 707, second image expression is generated by laser radar (LIDAR) the device and driving environmentA part of corresponding first cloud the first depth map.Factor pair second at a predetermined ratio of up-sampling/repair module 401Image (for example, image 707) up-sampling, image 707 is matched as the image scaled of image 703.In one embodiment,To the second image application algorithm of up-sampling repair function, to restore any lack part of image, for example, repairing up-samplingImage background parts.Reparation is the process that the loss of image or deterioration part are restored or reconstructed.In another implementationIn mode, repairing algorithm may include that the image for capturing LIDAR compares with the LIDAR image captured in previous temporal frameCompared with.Generator 1120 passes through to the first image (for example, image 703) and the second image application CNN up-sample and/or reparationModel 701 generates the second depth map (for example, high-resolution depth graph 709), wherein the second depth map (for example, image 709)There is higher resolution ratio than the first depth map (for example, image 707), so that the second depth map (for example, image 709) indicates senseKnow the second point cloud of the driving environment around ADV.

In one embodiment, the image 703 of cameras capture and LIDAR image 707 are panoramic pictures, such as cylinderShape or spherical panorama image.In another embodiment, the image 703 of cameras capture and LIDAR image 707 are perspective viewsPicture.Herein, for the camera configuration, fluoroscopy images can be from the single camera configured from monochrome/stereoscopic full views video cameraCollection or single camera generate.Monochromatic panoramic camera is configured, which may include about capturing multiple images at the same timeMultiple perspective video cameras, such as, the configuration 506 of Fig. 5 C.Image will be mixed or be stitched together by panorama module, such as be schemed4 panorama module 403, to generate panoramic picture.

LIDAR is configured, LIDAR image 707 is generated by following steps: the 3-D point that will be captured by LIDAR detectorCloud is from the space 3-D/Planar Mapping, followed by the conversion of 3-D point cloud to the 2-D plane of delineation.Herein, the 2-D image of image 707 is flatFace can be the plane of delineation identical with image 703.In another embodiment, LIDAR image 707 can be catches with video cameraThe corresponding perspective LIDAR image of fluoroscopy images 703 caught.It herein, can pairs of image 703 and image 707 to several perspectivesCNN model 701 is continuously applied, has an X-rayed LIDAR image to generate.Then, the perspective LIDAR image of generation can pass through panorama mouldBlock (such as, the panorama module 403 of Fig. 4) splices or mixes, to generate panorama LIDAR image.In another embodimentIn, generator 1120 may include multiple CNN models, and these models can be simultaneously applied to the pairs of image of multiple perspectives703 and image 707, to generate multiple perspective LIDAR images, to carry out panoramic picture generation.

Referring to Fig. 4 and Figure 11 A, in another embodiment, generator 1100 receives third image, such as cameras captureImage 705, the image 705 of the cameras capture is by the second cameras capture.Generator 1100 passes through to the first image, thirdImage generates the second depth map with up-sampling and/or reparation the second image application CNN model.Herein, image 703 and figureTo can be left and right stereo-picture (for example, the image captured by the configuration 514 of Fig. 5 E) or vertical top and bottom vertical as 705Body image (for example, the image captured by the configuration 524 of Fig. 5 F).

Figure 12 is contraction (for example, encoder/convolution) layer and the expansion for showing CNN model according to one embodimentThe diagram of (for example, decoder/deconvolution) layer.CNN model 1200 receives camera review 801, low resolution depth image 803And high-resolution depth graph is exported as 825.Camera review 801 and low resolution depth image 803 can be the figure of Figure 11 B respectivelyAs 703 and image 707.The image 709 that high-resolution depth graph can be Figure 11 B as 825.CNN model 1200 may include differenceLayer, such as up-sampling layer 1203, convolutional layer (807,809), warp lamination (811,817), prediction interval (813,819,823) andIt concatenates layer (815,821).Figure 12 is similar with Fig. 8 at most of aspects, in addition to applying LIDAR at the input layer of CNN modelExcept image (for example, low resolution depth image 803) and concatenation layer (for example, layer 808 of Fig. 8) can be omitted.

Referring to Fig.1 2, for example, camera review 801 may include monochromatic RGB video camera image (for example, 3 channels, 192 pixels× 96 pixels).Low resolution depth image 803 may include 48 pixel of single channel (that is, gray scale) × 24 pixels LIDAR image(that is, a quarter that image 803 is 801 ratio of image).Layer 1203 is up-sampled with scale factor (that is, four) on image 803Sampling so as to the image scaled of matching image 801, and exports a channel, 192 pixels × 96 pixels image.Up-sample layer1203 may include repair layer, so that can be using repairing algorithm to reconstruct the pixel of missing, wherein the pixel of missing can by byDim spot/artifact of LIDAR detector perception introduces, such as pit, shade and/or weather phenomenon.The up-sampling/repairImage its by convolutional layer 807 receive before, be combined with monochromatic RGB video camera image and (image channel be added in oneIt rises).For example, the input picture of layer 807 can be with 4 channels, 192 pixels × 96 Pixel Dimensions image.

Figure 13 is the flow chart for showing method according to one embodiment.Process 1300 can be executed by processing logic,The processing logic may include software, hardware or combinations thereof.For example, process 1300 can be by the point cloud module of automatic driving vehicleIt executes, the point cloud module 307 of such as Fig. 3.Referring to Fig.1 3, at frame 1301, processing logic is received by the first cameras captureFirst image, a part of the driving environment of first picture catching ADV.At frame 1302, processing logic receives the second image,Second image indicates being generated by laser radar (LIDAR) device, with the driving environment at a part of corresponding first pointFirst depth map of cloud.At frame 1303, logic the second picture up-sampling of factor pair at a predetermined ratio is handled, to match firstThe image scaled of image.At frame 1304, processing logic passes through the second image application convolution mind to the first image and up-samplingThe second depth map is generated through network (CNN) model, which has higher resolution ratio than the first depth map, so thatSecond depth map indicates the second point cloud for perceiving the driving environment around ADV.

In one embodiment, processing logic receives the third image by the second cameras capture, and by firstSecond image application CNN model of image, third image and up-sampling generates the second depth map.In one embodiment,First image includes cylindrical panoramic picture or spherical panorama image.In another embodiment, cylindrical panoramic picture or ballShape panoramic picture is generated based on several images captured by several camera systems.In another embodiment, processing is patrolledVolume second is reconstructed based in the space 3-D of cylindrical panoramic picture or spherical panorama image by projecting to the second depth mapPoint cloud.

In one embodiment, the second image of up-sampling is mapped to the plane of delineation of the first image by processing logicOn.In one embodiment, the second depth map is generated and the depth map for generating one or more is mixed, and is madeObtaining the second depth map is panorama sketch.

In one embodiment, CNN model includes shrinkage layer and expansion layer, wherein each shrinkage layer includes codingFor device to be attached to shrinkage layer to corresponding input progress down-sampling and expansion layer, each expansion layer includes decoder with rightCorresponding input is up-sampled.In one embodiment, the feedforward of information of shrinkage layer is to expansion layer.In an embodimentIn, each of expansion layer includes prediction interval, to predict to be used for the depth map of succeeding layer.In one embodiment, toTwo picture up-samplings include repairing the second image.

Figure 14 A and Figure 14 B are the exemplary block diagrams for showing the convolutional neural networks model according to some embodiments.ReferenceFigure 14 A, in one embodiment, depth map generator 1400 may include up-sampling module 1401 and CNN model 701.CNNModel 701 (as neural network/CRF model 313 a part) may include shrinkage layer (or encoder or convolutional layer) 713 HesExpansion layer (or decoder or warp lamination) 715.Figure 14 B shows the depth map generator 1420 of another exemplary embodiment.Depth map generator 1400 and 1420 can be executed by the depth map module 405 of Fig. 4.

Referring to Fig. 4 and Figure 14 B, generator 1420 is received by the first image of the first cameras capture (for example, video camera is caughtThe image 703 caught), a part of the driving environment of first picture catching ADV.Generator 1420 receives such as low resolutionSecond image of LIDAR image 707, second image expression is generated by laser radar (LIDAR) the device and driving environmentA part of corresponding first cloud the first depth map.Up-sample the second image of factor pair at a predetermined ratio of module 1401(for example, image 707) up-sampling, to match the image scaled of the output image of CNN model 701.Generator 1420 passes through toOne image (for example, image 703) determines the second depth map (for example, the output figure of CNN model 701 using CNN model 701Picture).Generator 1420 passes through to the first image (for example, image 703), the second image (for example, image 707) and the second depth map(for example, output image of CNN model 701) application conditions random field (CRF) model (is executed, i.e. the CRF of Fig. 4 by CRF 1403404) third depth map is generated, which has higher resolution ratio than the first depth map, so that third depth mapIndicate the second point cloud for perceiving the driving environment around ADV.

The Optimized model of such as CRF can be used to refine the estimation of depth/parallax.According to one aspect, end-to-end CNN mouldType includes CRF model, and the CRF model is including three at this item to optimize (or minimum) totle drilling cost function.For example, CRF costFunction may is that

CRF (x)=∑_i∈Vf_i(x_i)+∑_ij∈Uf_ij(x_ij)+∑_k∈Wg_k(x_k),

Wherein, x_iIt is the parallax value of ith pixel, V is the set of all pixels, and U is one group of image border and W isThe set of the mesh point of LIDAR image.First two (for example, f_i(x_i) and f_ij(x_ij)) can be respectively Stereo matching cost unitaryThe smooth pairs of item (that is, image pixel smoothness/discontinuity) of item and estimation comparison sensitive edge weight.

For example, CNN-CRF model may be arranged so that unitary item can be based on three-dimensional left RGB image and three-dimensional right RGB imageThe correlation (for example, output of the CNN model 701 based on Figure 14 A) of (such as, the image 703 and image 705 of Figure 14 A) carries outIt determines, i.e. Stereo matching cost.In alternative solution, CNN-CRF model may be arranged so that unitary item can be based on ith pixel" information gain " (namely based on the output of the CNN model 701 of Figure 14 B) of parallax value be determined, wherein the ith pixel" information gain " of parallax value have from being applied to monochromatic (or monocular) RGB image (such as, the image 703 of Figure 14 B)The contribution of all other parallax value.

The item being smoothed to pair can depth map estimated by the expression based on any pair of pixel smoothness/discontinuity viewDifference (for example, output based on CNN model 701) is determined.This example at this item is Knobelreiter's et al." End-to-End Training of Hybrid CNN-CRF Models for Stereo is (to hybrid solid CNN-CRFThe end-to-end training of model) " it is defined in (in November, 2016), the content of the document is integrally incorporated this by reference with itWen Zhong.In an alternative embodiment, it can be " the Estimating Depth from Monocular in Cao et al. at this itemImages as Classification Using Deep Fully Convolutional Residual Networks (makesClassified with the depth of the remaining network-evaluated monocular image of the complete convolution of depth) " (in May, 2016) middle information increasing limitedBenefit, during the content of the document is hereby incorporated by reference in its entirety by reference.Section 3 (for example, g (x)) can be into this item, this atThis item indicate estimated LIDAR image relative to low resolution LIDAR image error term (namely based on Figure 14 A to Figure 14 BCNN model 701 output and up-sample 1401 output).

In one embodiment, g (x) can be limited are as follows:

Wherein, threshold value is such as 1.0 or 2.0 predetermined threshold, and xi is the parallax value of ith pixel and dk is low pointThe parallax value of resolution LIDAR image.It should be noted that f (x) and g (x) item may include the weight term based on input picture 703, these powerWeight item pixel-by-pixel be applied to input picture, to protrude the contrast of image.For example, CRF 1403 can be based on Figure 14 A to Figure 14 BInput RGB image 703 to f (x) and/or g (x) apply weight term.

In one embodiment, the image 703 of cameras capture and LIDAR image 707 are panoramic pictures, such as cylinderShape or spherical panorama image.In another embodiment, the image 703 of cameras capture and LIDAR image 707 are perspective viewsPicture.Capture image 703 camera configuration may include Fig. 5 D to Fig. 5 F camera configuration any video camera.

LIDAR is configured, LIDAR image 707 is generated by following steps: the 3-D point that will be captured by LIDAR detectorCloud is from the space 3-D/Planar Mapping, followed by the conversion of 3-D point cloud to the 2-D plane of delineation.Herein, the 2-D image of image 707 is flatFace can be the plane of delineation identical with image 703.In another embodiment, LIDAR image 707 can be catches with video cameraThe corresponding perspective LIDAR image of fluoroscopy images 703 caught.It as described earlier, can be to the pairs of image 703 of several perspectivesCNN model 701 is continuously applied with image 707, has an X-rayed LIDAR image to generate.It in another embodiment, can be to multipleDepending on pairs of image 703 and image 707 simultaneously apply several CNN models, to generate multiple perspective LIDAR images, thus intoRow panoramic picture generates.

Referring to Fig. 4 and Figure 14 A, in another embodiment, generator 1400 receives third image, such as cameras captureImage 705, the image 705 of the cameras capture is by the second cameras capture.Generator 1400 passes through to the first image andThree image application CNN models determine the second depth map.By CRF 1403 to the second depth map application CRF model, to generateThird depth map.Herein, image 703 and image 705 can be left and right stereo-picture (for example, being captured by the configuration 514 of Fig. 5 EImage) or vertical top and bottom stereo-picture (for example, the image captured by the configuration 524 of Fig. 5 F).

Figure 15 is the flow chart for showing method according to one embodiment.Process 1550 can be executed by processing logic,The processing logic may include software, hardware or combinations thereof.For example, process 1550 can be by the point cloud module of automatic driving vehicleIt executes, the point cloud module 307 of such as Fig. 3.Referring to Fig.1 5, at frame 1551, processing logic is received by the first cameras captureFirst image, a part of the driving environment of first picture catching ADV.At frame 1552, processing logic receives the second image,Second image indicates being generated by laser radar (LIDAR) device, with the driving environment at a part of corresponding first pointFirst depth map of cloud.At frame 1553, processing logic by the first image application convolutional neural networks (CNN) model come trueFixed second depth map.At frame 1554, processing logic by the first image, the second image and the second depth map application conditions withMachine domain function generates third depth map, which has higher resolution ratio than the first depth map, so that third is deepDegree figure indicates the second point cloud of the driving environment around perception ADV.

In one embodiment, processing logic receives the third image by the second cameras capture, and by firstImage and third image application CNN model generate third depth map.In one embodiment, the first image includes cylinderPanoramic picture or spherical panorama image.In another embodiment, cylindrical panoramic picture or spherical panorama seem based on byWhat several images that several camera systems capture generated.In another embodiment, processing logic is by by third depth mapIt projects to and reconstructs second point cloud based in the space 3-D of cylindrical panoramic picture or spherical panorama image.

In one embodiment, third image is mapped on the plane of delineation of the first image by processing logic.At oneIn embodiment, third depth map is generated and the depth map for generating one or more is mixed, so that third is deepDegree figure is panorama sketch.

In one embodiment, CNN model includes shrinkage layer and expansion layer, wherein each shrinkage layer includes codingFor device to be attached to shrinkage layer to corresponding input progress down-sampling and expansion layer, each expansion layer includes decoder with rightCorresponding input is up-sampled.In one embodiment, the feedforward of information of shrinkage layer is to expansion layer.In an embodimentIn, each of expansion layer includes prediction interval, to predict to be used for the depth map of succeeding layer.

It should be noted that some or all of component being such as shown and described above can be in software, hardware or combinations thereofImplement.For example, this base part may be embodied as being fixed and stored within the software in permanent storage device, the software can lead toThe load of processor (not shown) is crossed to be executed in memory and in memory to implement through process described herein or behaviourMake.Alternatively, this base part may be embodied as programming or be embedded into specialized hardware (such as, integrated circuit be (for example, dedicated integratedCircuit or ASIC), digital signal processor (DSP) or field programmable gate array (FPGA)) in executable code, it is described canExecuting code can access via the respective drive program and/or operating system of carrying out self-application.In addition, this base part can be realIt applies as the specific hardware logic in processor or processor cores, as one or more specific instructions can be passed through by software componentA part of the instruction set of access.

Figure 16 is the exemplary frame for showing the data processing system that can be used together with an embodiment of the disclosureFigure.For example, system 1500 can indicate any data processing of any of the above-described execution above process or methodSystem, for example, any of the perception of Fig. 1 and planning system 110 or server 103 to 104.System 1500 may includeMany different components.These components may be embodied as integrated circuit (IC), the part of integrated circuit, discrete electronics or suitableFor circuit board (such as, the mainboard of computer system or insertion card) other modules or be embodied as being incorporated to meter in other waysComponent in the rack of calculation machine system.

It shall yet further be noted that system 1500 is intended to show that the multipart high-order view perhaps of computer system.It is, however, to be understood that, can have additional component in some embodiments, in addition, can have the different cloth of shown component in other embodimentsIt sets.System 1500 can indicate that desktop computer, laptop computer, tablet computer, server, mobile phone, media are broadcastIt puts device, personal digital assistant (PDA), smartwatch, personal communicator, game device, network router or hub, wirelessly connectAccess point (AP) or repeater, set-top box or combinations thereof.Although in addition, illustrate only individual machine or system, term " machineDevice " or " system ", which should also be understood as including, executes (or multiple) instruction set either individually or collectively to execute this paper instituteThe machine of any one or more of method discussed or any set of system.

In one embodiment, system 1500 includes the processor 1501 connected by bus or interconnection piece 1510, depositsReservoir 1503 and device 1505 to 1508.Processor 1501 can be indicated including single processor kernel or multiple processingThe single processor of device kernel or multiple processors.Processor 1501 can indicate one or more general processors, such as, micro-Processor, central processing unit (CPU) etc..More specifically, processor 1501 can be the micro- place complex instruction set calculation (CISC)It manages device, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor or implements other instruction setProcessor or the processor for implementing instruction set combination.Processor 1501 can also be one or more application specific processors, such as,Specific integrated circuit (ASIC), honeycomb or baseband processor, field programmable gate array (FPGA), digital signal processor(DSP), network processing unit, graphics processor, communication processor, encryption processor, coprocessor, embeded processor orIt is capable of handling the logic of any other type of instruction.

Processor 1501 (it can be low power multi-core processor socket, such as ultralow voltage processor) can serve asMain Processor Unit and central hub for being communicated with the various parts of the system.This processor may be embodied as on pieceSystem (SoC).Processor 1501 is configured to execute the instruction for executing operation and step discussed in this article.System 1500It can also include the graphic interface communicated with optional graphics subsystem 1504, graphics subsystem 1504 may include display controlDevice, graphics processor and/or display device.

Processor 1501 can be communicated with memory 1503, and memory 1503 in one embodiment can be via multipleMemory device is implemented to provide the storage of the system of specified rate.Memory 1503 may include one or more volatile storages(or memory) device, such as, random access memory (RAM), dynamic ram (DRAM), synchronous dram (SDRAM), static state RAM(SRAM) or other types of storage device.Memory 1503 can store including by processor 1501 or any other deviceThe information of the instruction sequence of execution.For example, various operating systems, device driver, firmware are (for example, input and output fundamental systemSystem or BIOS) and/or application executable code and/or data can be loaded into memory 1503 and be held by processor 1501Row.Operating system can be any kind of operating system, for example, robot operating system (ROS), coming fromCompanyOperating system, the Mac from Apple Inc./It comes fromCompanyLINUX, UNIX or other real-time or embedded OSs.

System 1500 can also include I/O device, such as device 1505 to 1508, including Network Interface Unit 1505, canThe input unit 1506 of choosing and other optional I/O devices 1507.Network Interface Unit 1505 may include wireless transceiverAnd/or network interface card (NIC).The wireless transceiver can be WiFi transceiver, infrared transceiver, bluetooth transceiver,WiMax transceiver, wireless cellular telephony transceiver, satellite transceiver (for example, global positioning system (GPS) transceiver) or otherRadio frequency (RF) transceiver or their combination.NIC can be Ethernet card.

Input unit 1506 may include mouse, touch tablet, (it can be integrated in one with display device 1504 to touch sensitive screenRise), indicator device (such as, writing pencil) and/or keyboard be (for example, physical keyboard or a part display as touch sensitive screenDummy keyboard).For example, input unit 1506 may include the touch screen controller for being connected to touch screen.Touch screen and touch screenController can be used for example in a variety of Touch technologies (including but not limited to capacitor, resistance, infrared and surface acoustic wave technique)Other elements of any and other proximity sensor arrays or the one or more points contacted for determination with touch screenDetect its contact and mobile or interruption.

I/O device 1507 may include audio devices.Audio devices may include loudspeaker and/or microphone, to promoteSupport the function of voice, such as speech recognition, speech reproduction, digital record and/or telephony feature.Other I/O devices 1507 are alsoMay include the port universal serial bus (USB), parallel port, serial port, printer, network interface, bus bridge (for example,PCI-PCI bridge), sensor is (for example, such as accelerometer motion sensor, gyroscope, magnetometer, optical sensor, compass, connectNearly sensor etc.) or their combination.Device 1507 can also include imaging subsystem (for example, video camera), describedImaging subsystem may include the optical sensing for promoting camera function (such as, recording photograph and video clip)Device, such as charge coupled device (CCD) or complementary metal oxide semiconductor (CMOS) optical sensor.Certain sensors can be withIt is connected to interconnection piece 1510 via sensor hub (not shown), and other devices of such as keyboard or heat sensor can rootIt is controlled according to the concrete configuration or design of system 1500 by embedded controller (not shown).

In order to provide the permanent storage to information such as data, application, one or more operating systems, large capacity is depositedStorage equipment (not shown) can also be connected to processor 1501.In various embodiments, thinner and lighter it is to realizeSystem designs and improves system responsiveness, and this mass-memory unit can be implemented via solid-state device (SSD).However,In other embodiments, mass-memory unit can mainly be implemented using hard disk drive (HDD), wherein small amount ofSSD storage equipment serves as SSD cache to realize the non-of context state and other this type of information during power cut-off incidentVolatile storage, so that can be realized when the system activity is restarted quick power-on.In addition, flash memory device can be such asProcessor 1501 is connected to via serial peripheral interface (SPI).This flash memory device can provide the non-volatile of system softwareStorage, the system software includes the BIOS and other firmwares of the system.

Storage device 1508 may include that (also referred to as machine readable storage is situated between computer-accessible storage medium 1509Matter or computer-readable medium), be stored thereon with one for embodying any one or more of method as described herein or function orMultiple instruction collection or software (for example, module, unit and/or logic 1528).Processing module/unit/logic 1528 can indicateAny of above-mentioned component, such as planning module 305, control module 306 and high-resolution point cloud module 307.Processing module/Unit/logic 1528 can also be during it be executed by data processing system 1500, memory 1503 and processor 1501 completelyGround resides at least partially in memory 1503 and/or in processor 1501, data processing system 1500, memory 1503The storage medium of machine-accessible is also constituted with processor 1501.Processing module/unit/logic 1528 can also be passed through by networkIt is transmitted or is received by Network Interface Unit 1505.

Computer readable storage medium 1509 can also be used to permanently store some software functions described above.ThoughRight computer readable storage medium 1509 is illustrated as single medium in the exemplary embodiment, but term is " computer-readableStorage medium " should be believed to comprise to store the single medium of one or more of instruction set or multiple media (for example, concentratingFormula or distributed data base and/or associated cache and server).Term " computer readable storage medium " should alsoIt is believed to comprise to store or any medium of coded command collection, described instruction collection is used to be executed by machine and be made describedAny one or more of method of the machine execution disclosure.Therefore, term " computer readable storage medium " should be considered wrappingInclude but be not limited to solid-state memory and optical medium and magnetic medium or any other non-transitory machine readable media.

Process described herein module/unit/logic 1528, component and other feature may be embodied as discrete hardware portionPart is integrated in the function of hardware component (such as, ASIC, FPGA, DSP or similar device).In addition, processing module/unit/Logic 1528 may be embodied as firmware or functional circuit in hardware device.In addition, processing module/unit/logic 1528 can be withImplemented with hardware device and any combination of software component.

It should be noted that although system 1500 is shown as the various parts with data processing system, it is not intended that tableShow any certain architectures or mode for making component connection；Because such details and embodiment of the present disclosure do not have substantial connection.It should also be appreciated that have less component or may have more multipart network computer, handheld computer, mobile phone,Server and/or other data processing systems can also be used together with embodiment of the present disclosure.

Some parts in foregoing detailed description according in computer storage to the algorithm of the operation of data bitIt indicates and presents with symbol.These algorithm descriptions and mode used in the technical staff that expression is in data processing field, withTheir work is substantially most effectively communicated to others skilled in the art.Herein, algorithm is typically consideredLead to desired result is in harmony the sequence of operation certainly.These operations refer to the operation for needing that physical manipulation is carried out to physical quantity.

It should be borne in mind, however, that all these and similar terms be intended to register appropriate, and be onlyFacilitate label applied to this tittle.Unless in other ways it is manifestly intended that otherwise it is to be appreciated that whole in described aboveIn a specification, refer to computer system using the discussion that term (term illustrated in such as the appended claims) carries outOr the movement and processing of similar computing electronics, the computer system or computing electronics manipulate posting for computer systemThe data of being expressed as in storage and memory physics (electronics) amount, and by the data be transformed into computer system memory orOther data of physical quantity are similarly represented as in register or other this type of information storage equipment, transmission or display device.

Embodiment of the present disclosure further relates to apparatuses for performing the operations herein.This computer program is storedIn non-transitory computer-readable medium.Machine readable media includes for the form readable with machine (for example, computer)Store any mechanism of information.For example, machine readable (for example, computer-readable) medium include machine (for example, computer) canStorage medium is read (for example, read-only memory (" ROM "), random access memory (" RAM "), magnetic disk storage medium, optical storage JieMatter, flash memory devices).

Discribed process or method can be executed by processing logic in aforementioned figures, and the processing logic includes hardwareThe combination of (for example, circuit, special logic etc.), software (for example, being embodied in non-transitory computer-readable medium) or both.Although the process or method are operated according to some sequences to describe, it will be understood that in the operation aboveIt is some to be executed in a different order.In addition, some operations can be performed in parallel rather than be sequentially performed.

Embodiment of the present disclosure is not described with reference to any specific programming language.It should be understood that can be usedA variety of programming languages implement the introduction of embodiment of the present disclosure as described herein.

In above specification, by reference to the specific illustrative embodiment of the present invention to embodiment of the present disclosureIt is described.It is evident that do not depart from the disclosure described in the appended claims it is broader spirit andIn the case where range, can to the present invention various modification can be adapted.It therefore, should be in descriptive sense rather than in restrictive senseTo understand the specification and drawings.

Claims

1. a kind of for operating the method implemented by computer of automatic driving vehicle, which comprises

The first image by the first cameras capture is received, the first image captures the driving environment of the automatic driving vehicleA part；

The second image is received, second image indicates a part generated by laser radar apparatus, with the driving environmentFirst depth map of corresponding first cloud；

By generating to the first image, second image and the second depth map application conditions random field modelThree depth maps, the third depth map have higher resolution ratio than first depth map, wherein the third depth chartShow the second point cloud for perceiving the driving environment around the automatic driving vehicle.

2. according to the method described in claim 1, further include:

Receive the third image by the second cameras capture；And

By determining that described second is deep using the convolutional neural networks model to the first image and the third imageDegree figure.

3. according to the method described in claim 1, wherein, the first image includes cylindrical panoramic picture or spherical panoramaPicture.

4. according to the method described in claim 3, wherein, the cylinder panoramic picture or the spherical panorama seem to be based onIt is generated by the multiple images that multiple camera systems capture.

5. according to the method described in claim 3, further include:

It is empty by the way that second depth map is projected to the 3-D based on the cylindrical panoramic picture or the spherical panorama imageBetween in reconstruct the second point cloud.

6. according to the method described in claim 1, further include:

The third image is mapped on the plane of delineation of the first image.

7. according to the method described in claim 6, wherein, the third depth map is the depth by generating one or moreFigure is mixed and is generated, wherein the third depth map is panorama sketch.

8. according to the method described in claim 1, wherein, the convolutional neural networks model includes:

Multiple shrinkage layers, wherein each shrinkage layer includes encoder to carry out down-sampling to corresponding input；And

Multiple expansion layers are attached to the multiple shrinkage layer, wherein each expansion layer includes decoder to corresponding inputIt is up-sampled.

9. according to the method described in claim 8, wherein, the information feedforward of the multiple shrinkage layer gives the multiple expansionLayer.

10. according to the method described in claim 8, wherein, each of the multiple expansion layer includes prediction interval, with predictionDepth map for succeeding layer.

11. a kind of non-transitory machine readable media for being stored with instruction, described instruction causes described when executed by the processorProcessor executes operation, and the operation includes:

12. non-transitory machine readable media according to claim 11, the operation further include:

Receive the third image by the second cameras capture；And

13. non-transitory machine readable media according to claim 11, wherein the first image includes cylindrical completeScape image or spherical panorama image.

14. non-transitory machine readable media according to claim 13, wherein the cylinder panoramic picture or describedSpherical panorama seems to be generated based on the multiple images captured by multiple camera systems.

15. non-transitory machine readable media according to claim 13, the operation further include:

16. a kind of data processing system, comprising:

Processor；And

Memory, the memory are attached to the processor, and with store instruction, described instruction by the processor when being executedThe processor is caused to execute operation, the operation includes:

17. system according to claim 16, the operation further include:

Receive the third image by the second cameras capture；And

18. system according to claim 16, wherein the first image includes cylindrical panoramic picture or spherical panoramaImage.

19. system according to claim 18, wherein it is described cylinder panoramic picture or the spherical panorama seem baseIt is generated in the multiple images captured by multiple camera systems.

20. system according to claim 18, the operation further include: