Movatterモバイル変換


[0]ホーム

URL:


CN117836818A - Information processing device, information processing system, model, and model generation method - Google Patents

Information processing device, information processing system, model, and model generation method
Download PDF

Info

Publication number
CN117836818A
CN117836818ACN202280055900.9ACN202280055900ACN117836818ACN 117836818 ACN117836818 ACN 117836818ACN 202280055900 ACN202280055900 ACN 202280055900ACN 117836818 ACN117836818 ACN 117836818A
Authority
CN
China
Prior art keywords
image
unit
information processing
feature points
processing device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280055900.9A
Other languages
Chinese (zh)
Inventor
大石圭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Group Corp
Original Assignee
Sony Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Group CorpfiledCriticalSony Group Corp
Publication of CN117836818ApublicationCriticalpatent/CN117836818A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明提供一种信息处理装置,具备获取多个图像中共同的特征点以及特征量并生成用于对共同的上述特征点进行匹配的模型的学习器(370),向上述学习器输入从第一拍摄部获取到的第一图像以及从第二拍摄部获取到的第二图像中的一方图像、和从与该一方图像不同的拍摄部获取并投影的投影图像,作为一对输入数据。

The present invention provides an information processing device, comprising a learner (370) for acquiring common feature points and feature quantities in a plurality of images and generating a model for matching the common feature points. A first image acquired from a first imaging unit and a second image acquired from a second imaging unit, and a projection image acquired and projected from an imaging unit different from the first image, are input to the learner as a pair of input data.

Description

Information processing device, information processing system, model, and model generation method
Technical Field
The present disclosure relates to an information processing apparatus, an information processing system, a model, and a method of generating the model.
Background
In the near future, a mobile body (for example, an automated car) using an automated driving technique and a highly intelligent robot are considered to be used daily, and it is assumed that the mobile body and the robot are mounted with a plurality of sensors for capturing the surrounding environment. In view of such a background, as one of sensor fusion techniques using sensing data obtained from a plurality of sensors, a technique for accurately and easily aligning information of the plurality of sensors is further demanded.
Non-patent document 1: daniel DeTone, tomasz Malisiewicz, andrew Rabinovich, superPoint: self-Supervised Interest Point Detection and Description, proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) workbench, 2018, pp.224-236
However, in the related art for correctly and easily aligning information of a plurality of sensors, it is premised on matching feature points between images obtained from the same kind of sensor. Therefore, in the related art, it is difficult to perform feature point matching between images obtained from sensors of different types with high accuracy.
Disclosure of Invention
Accordingly, in the present disclosure, an information processing apparatus, an information processing system, a model, and a model generation method capable of performing feature point matching between images obtained from sensors of different types with high accuracy are proposed.
According to the present disclosure, there is provided an information processing apparatus including a learner that acquires a feature point and a feature quantity common to a plurality of images and generates a model for matching the common feature point, wherein one of a first image acquired from a first imaging unit and a second image acquired from a second imaging unit and a projection image acquired and projected from a different imaging unit from the one image are input to the learner as a pair of input data.
Further, according to the present disclosure, there is provided an information processing system including a first image capturing unit, a second image capturing unit, and an information processing apparatus including a learner that obtains a feature point and a feature quantity common to a plurality of images and generates a model for matching the common feature point, wherein one of the first image obtained from the first image capturing unit and the second image obtained from the second image capturing unit and a projection image obtained and projected from an image capturing unit different from the one image capturing unit are input to the learner as a pair of input data.
Further, according to the present disclosure, there is provided a model in which a computer is caused to function to acquire a feature point and a feature amount common to a plurality of images and to match the feature point in common, wherein one of a first image acquired from a first imaging unit and a second image acquired from a second imaging unit and a projection image acquired and projected from a different imaging unit from the one image are used as a pair of input data, and the model is obtained by performing machine learning.
Further, according to the present disclosure, there is provided a method for generating a model for causing a computer to function to acquire a feature point and a feature amount common to a plurality of images and to generate a model for matching the common feature point, wherein the model is generated by performing machine learning using one of a first image acquired from a first imaging unit and a second image acquired from a second imaging unit and a projection image acquired and projected from a different imaging unit as a pair of input data.
Drawings
Fig. 1 is an explanatory diagram illustrating a configuration example of an information processing system 10 according to an embodiment of the present disclosure.
Fig. 2 is a block diagram (one of them) showing an example of the configuration of an information processing apparatus 300 according to an embodiment of the present disclosure.
Fig. 3 is a flowchart (one of them) illustrating an example of an information processing method of an embodiment of the present disclosure.
Fig. 4 is a flowchart (second) illustrating an example of an information processing method according to an embodiment of the present disclosure.
Fig. 5 is an explanatory diagram illustrating the background of authoring an embodiment of the present disclosure.
Fig. 6 is a flowchart (third) illustrating an example of an information processing method of the embodiment of the present disclosure.
Fig. 7 is an explanatory diagram illustrating an example of input data of the embodiment of the present disclosure.
Fig. 8 is a block diagram (second) showing an example of the configuration of an information processing apparatus 300 according to the embodiment of the present disclosure.
Fig. 9 is an explanatory diagram illustrating an example of the generation of a mask in the embodiment of the present disclosure.
Fig. 10 is a block diagram (third) showing an example of the configuration of an information processing apparatus 300 according to the embodiment of the present disclosure.
Fig. 11 is an explanatory diagram (one of them) illustrating an example of generation of a positive solution label in the embodiment of the present disclosure.
Fig. 12 is an explanatory diagram (second) illustrating an example of generation of a positive solution label according to the embodiment of the present disclosure.
Fig. 13 is a block diagram (fourth) showing an example of the configuration of an information processing apparatus 300 according to the embodiment of the present disclosure.
Fig. 14 is a block diagram showing an example of the configuration of the learning unit 370 according to the embodiment of the present disclosure.
Fig. 15 is an explanatory diagram (one of them) illustrating an example of learning of the embodiment of the present disclosure.
Fig. 16 is an explanatory diagram (second) illustrating an example of learning of the embodiment of the present disclosure.
Fig. 17 is a hardware configuration diagram showing an example of a computer that realizes the functions of the information processing apparatus 300.
Fig. 18 is a block diagram showing a configuration example of a vehicle control system.
Fig. 19 is a diagram showing an example of the sensing region.
Detailed Description
Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the present specification and drawings, the same reference numerals are given to components having the same functional structures, and overlapping descriptions are omitted. In the present specification and drawings, different letters may be appended to the same reference numerals to distinguish structural elements that actually have the same or similar functional structures. However, in the case where it is not necessary to particularly distinguish each of a plurality of structural elements actually having the same or similar functional structure, only the same reference numeral is attached.
In the following description, the feature points refer to positions and coordinate information of the feature points such as the center point, the branch points, the intersections, and the end points on the outline of the subject in the image, which show the shape of the subject. The feature quantity is information such as the shape, direction, and expansion of the feature point, which can be represented by digitizing the features of the feature point.
The following procedure is described.
1. Background of authoring embodiments of the present disclosure
1.1 overview of an information handling System
1.2 detailed construction of information processing apparatus
1.3 information processing method
1.4 background
2. Description of the embodiments
2.1 information processing method
2.2 Generation of input data
2.3 Generation of orthographic labels
2.4 learning
3. Summary
4. Hardware structure
5. Application example
6. Supplement and supplement
1. Background of the embodiments of the present disclosure
First, before explaining embodiments of the present disclosure, a background in which the present inventors have authored the embodiments of the present disclosure is explained.
As described above, in the near future, a mobile body (for example, an automated car) using an automated driving technique and a highly intelligent robot are considered to be used daily, and it is assumed that a plurality of sensors for capturing the surrounding environment are mounted on the mobile body and the robot. In view of such a background, as one of sensor fusion technologies using sensing data obtained from a plurality of sensors, a technology for accurately and easily aligning these plurality of different kinds of sensors is further demanded. First, an outline of an information processing system using such a technique will be described.
< 1.1 outline of information processing System >
First, an outline of an information processing system 10 according to an embodiment of the present disclosure will be described with reference to fig. 1. Fig. 1 is an explanatory diagram illustrating a configuration example of an information processing system 10 according to the present embodiment.
As shown in fig. 1, the information processing system 10 of the present embodiment includes a LiDAR (Light Detection and Ranging: liDAR) (first imaging section) 100, a camera (second imaging section), and an information processing apparatus 300, which are communicably connected to each other via a network. Specifically, although any communication method can be applied regardless of the wired or wireless (for example, wiFi (registered trademark), bluetooth (registered trademark)) communication method that can maintain stable operation is preferably used. The LiDAR100, the camera, and the information processing apparatus 300 may be connected to a network via a base station or the like (not shown) (for example, a base station of a mobile phone, an access point of a wireless LAN (Local Area Network: local area network), or the like). The outline of each device included in the information processing system 10 according to the present embodiment will be described in order.
(LiDAR100)
The LiDAR100 can measure the distance (relative coordinates) to the subject or determine the shape of the subject by irradiating the laser light to the subject while scanning the laser light and observing the scattered and reflected light. In this specification, an image based on reflected light acquired by the LiDAR100 is referred to as a reflected intensity image (first image). In addition, in the embodiment of the present disclosure, a ToF (Time of Flight) sensor (not shown) may be used instead of the LiDAR 100. The ToF sensor can measure the distance to the subject or determine the shape of the subject by irradiating pulsed light to the subject and observing the return time of the light reflected by the subject.
(Camera 200)
The camera 200 is an image sensor capable of detecting radiation light from a subject and outputting an image signal in order to acquire an image of the subject based on the radiation light from the subject. Specifically, the camera 200 is an RGB image sensor, specifically, an image sensor capable of color imaging, in which a plurality of pixels capable of detecting blue light, green light, and red light, which are visible light, are arranged in a Bayer array. In this specification, an image based on visible light acquired by the camera 200 is referred to as a visible light image (second image). In the present embodiment, a monochrome (black-and-white) image sensor may be used instead of the RGB image sensor.
(information processing apparatus 300)
The information processing apparatus 300 is constituted by a computer or the like, for example. The information processing apparatus 300 processes, for example, images acquired by the LiDAR100 and the camera 200, and outputs images and the like obtained by the processing to other devices. The information processing apparatus 300 can perform alignment (calibration) of the LiDAR100 and the camera 200. Further, details of the information processing apparatus 300 will be described later.
In the present embodiment, the information processing apparatus 300 may be constituted by a plurality of apparatuses, and is not particularly limited.
In fig. 1, the information processing system 10 of the present embodiment is shown to include one LiDAR100 and a camera 200, but the present embodiment is not limited thereto. For example, the information processing system 10 of the present embodiment may include a plurality of LiDAR100 and cameras 200. The information processing system 10 of the present embodiment may include, for example, another image sensor that observes light of a specific wavelength and generates an image, and is not particularly limited.
Detailed structure of 1.2 information processing apparatus
Next, a detailed configuration of the information processing apparatus 300 according to the present embodiment will be described with reference to fig. 2. Fig. 2 is a block diagram showing an example of the configuration of an information processing apparatus 300 according to the present embodiment. Here, the description will be focused on the function of the information processing apparatus 300 for performing alignment of the LiDAR100 and the camera 200. As described above, the information processing apparatus 300 is constituted by a computer or the like, for example. In detail, as shown in fig. 2, the information processing apparatus 300 mainly includes a reflected intensity image acquisition unit 302, a visible light image acquisition unit 304, a reflected intensity image processing unit 306, a visible light image processing unit 308, a feature point acquisition unit 310, a position information acquisition unit 312, and a calibration unit 314. The following describes each functional unit of the information processing apparatus 300 in detail.
(reflection intensity image acquiring section 302)
The reflected intensity image acquisition unit 302 acquires data of a reflected intensity image from the LiDAR100, and outputs the acquired data to a reflected intensity image processing unit 306, which will be described later.
(visible light image acquisition section 304)
The visible light image acquisition unit 304 acquires data of a visible light image from the camera 200, and outputs the data to a visible light image processing unit 308 described later.
(reflection intensity image processing section 306)
The reflection intensity image processing unit 306 cuts the image into a predetermined position (viewpoint), a predetermined size, and a predetermined view angle (FOV) based on the reflection intensity image data from the reflection intensity image acquisition unit 302, and generates a reflection intensity image as input data. The reflection intensity image processing unit 306 outputs the generated reflection intensity image to a feature point acquisition unit 310 described later. At this time, the reflection intensity image processing unit 306 may perform optical distortion, brightness adjustment (gain adjustment), and contrast adjustment (gamma adjustment) in the image.
(visible light image processing section 308)
The visible light image processing unit 308 cuts an image into a predetermined position (viewpoint), a predetermined size, and a predetermined view angle (FOV) based on the visible light image data from the visible light image acquisition unit 304, and generates a visible light image as input data. The visible light image processing unit 308 outputs the generated visible light image to a feature point acquisition unit 310 described later. At this time, the visible light image processing unit 308 may perform optical distortion, brightness adjustment (gain adjustment), and contrast adjustment (gamma adjustment) in the image.
(feature Point acquiring section 310)
The feature point acquisition unit 310 can acquire feature points and feature amounts in a plurality of images using a model of the present embodiment described later, and can perform matching of a front feature point common to the plurality of images based on the acquired feature points and feature amounts. For example, in the present embodiment described below, the feature point acquisition unit 310 may perform feature point matching between the reflection intensity image and the visible light image, and may perform feature point matching between a plurality of reflection intensity images or between a plurality of visible light images. However, since the use of the LiDAR100 and the camera 200 in the alignment will be described here, only the matching of the common feature points between the visible light image and the reflection intensity image will be described. The feature point acquisition unit 310 outputs information of the feature point (coordinate information in the image, etc.) to be matched to the calibration unit 314 described later. For example, in the matching of the present embodiment, as the feature amount of each feature point, a norm is calculated, and the feature point having the smallest distance between the plurality of images is matched. The model generation of the present embodiment will be described in detail later.
(position information acquiring section 312)
The position information acquisition unit 312 can acquire the distance to the subject and the relative position coordinates of the subject based on the time at which the irradiated light detected by the LiDAR100 is reflected by the subject and returned, and can output the acquired distance and the like to the calibration unit 314 described later. In the present embodiment, the calculation of the distance and the like may be performed by the LiDAR 100.
(calibration part 314)
The calibration unit 314 can calibrate (correct) the spatial difference (positional relationship) and the optical difference between the LiDAR100 and the camera 200. For example, the calibration unit 314 corrects external parameters (positional parameters) and/or internal parameters (optical parameters) of the LiDAR100 and the camera 200 based on differences in the positions where the LiDAR100 and the camera 200 are disposed (parallax, distance to the subject), differences in the angle of view of the LiDAR100 and the camera 200, and lens aberrations, so as to cancel differences (deviations) in positional information in images output from the LiDAR100 and the camera 200. At this time, the calibration unit 314 can correct the feature points that have been matched by the feature point acquisition unit 310 by using the position information (coordinate information on the world coordinate system or the relative coordinate system) based on the position information acquisition unit 312.
In the present embodiment, the configuration of the information processing apparatus 300 is not limited to that shown in fig. 2, and may further include, for example, a functional block not shown.
< 1.3 information processing method >)
Next, an information processing method according to an embodiment of the present disclosure will be described with reference to fig. 3 and 4. Here, the process of performing alignment of the LiDAR100 and the camera 200 performed by the information processing apparatus 300 will be described. Fig. 3 and 4 are flowcharts illustrating an example of the information processing method according to the present embodiment.
In detail, as shown in fig. 3, the information processing method of the present embodiment can mainly include a plurality of steps from step S100 to step S400. The following describes each of these steps in this embodiment in detail.
First, the information processing apparatus 300 collects one or more visible light images from the camera 200 (step S100). Next, the information processing apparatus 300 collects one or more reflected intensity images from the LiDAR (step S200).
Then, the information processing apparatus 300 acquires the feature points and the feature amounts in the visible light image and the reflected intensity image collected in the above-described step S100 and step S200, and performs matching of the feature points common to the visible light image and the reflected intensity image based on the acquired feature points and feature amounts (step S300). The information processing apparatus 300 corrects (corrects) the spatial difference (positional relationship) and the optical difference between the LiDAR100 and the camera 200 (step S400). At this time, the information processing apparatus 300 can correct the feature points using the position information (coordinate information on the world coordinate system or the relative coordinate system) of the feature points that have been matched.
In detail, step S300 of fig. 3 may mainly include a plurality of steps from step S301 to step S303 shown in fig. 4. The following describes each of these steps in detail.
First, the information processing apparatus 300 acquires feature points and feature amounts from the visible light image collected in step S100 using a model of the present embodiment described later (step S301). Next, the information processing apparatus 300 acquires feature points and feature amounts from the reflection intensity image collected in step S200 using the above model (step S302).
The information processing apparatus 300 performs matching of the feature points between the reflection intensity image and the visible light image based on the feature points and the feature amounts acquired in step S301 and step S302 described above (step S303). For example, the information processing apparatus 300 calculates norms as feature amounts of the respective feature points, and matches the feature point having the smallest distance between images as a common feature point.
The flow shown in fig. 3 and 4 is an example of the information processing according to the present embodiment, and the information processing according to the present embodiment is not limited to this.
< 1.4 background >
Next, a description will be given of a background in which the present inventors have authored an embodiment of the present disclosure, with reference to fig. 5. Fig. 5 is an explanatory diagram illustrating the background of authoring the present embodiment.
As described above, as one of sensor fusion techniques using sensing data obtained from a plurality of sensors, a technique for accurately and easily aligning the plurality of sensors is further required. As such a technique, feature point matching can be performed between images acquired by the LiDAR100 and the camera 200.
For example, scale-Invariant Feature Transform (Scale invariant feature transform SIFT) is one of algorithms for feature point detection and feature quantity description, and features points are detected from differences using a smoothed image obtained by convolving Laplacian of Gaussian (laplacian of gaussian LoG) with Differences of Gaussian (gaussian difference DoG), and 128-dimensional gradient vectors obtained from pixel information around the detected feature points are described as feature quantities. In SIFT, the feature values can be described stably for rotation, scale change, illumination change, and the like of an image with respect to the detected feature points, and therefore, the SIFT can be used for matching of images such as image mosaic, object recognition, and detection. However, SIFT is a craftsman method composed of algorithms of a rule base considered by a person, and is relatively cumbersome.
"SuperPoint: self-supervised interest point detection and description (Self-supervision feature point detection and description) "is one of algorithms that make use of machine learning to perform feature point detection and feature quantity description. In Superpoint, for a certain image, a pair of an original image and an image on which random projection is applied is input as input data to a Deep Neural Network (DNN). Further, in the Superpoint, an algorithm (model) for matching feature points common to a plurality of images can be generated by learning feature points by a forward solution tag (teacher data) and learning feature quantities by calculating similar vectors between pixels corresponding to pairs of inter-image positions.
Such a conventional technique is premised on matching feature points between images obtained from the same type of sensor, and has robust features for projection such as enlargement, reduction, rotation, and the like. However, in the related art, in the feature point matching between images obtained from different sensors (different fields) such as the LiDAR100 and the camera 200, as shown in fig. 5, if the feature points (as shown by circles in the figure) that match between the reflected intensity image 400 and the visible light image 500 cannot be detected with high accuracy, or if the feature points that are common between the reflected intensity image 400 and the visible light image 500 cannot be matched, the accuracy is lowered.
Accordingly, the present inventors have devised embodiments of the present disclosure described below in view of such a situation. In an embodiment of the present disclosure authored by the present inventor, feature points and feature amounts common to a plurality of images (specifically, a reflected intensity image and a visible light image) obtained from different kinds of sensors are acquired through a Deep Neural Network (DNN), and a model (algorithm) for matching the common feature points is generated. In this case, DNN performs machine learning using not only a large number of reflected intensity images and visible light images but also images obtained by projecting these images as input data. According to the embodiments of the present disclosure, a model (algorithm) that can accurately and easily match feature points even with images obtained from different types of sensors can be obtained. Hereinafter, embodiments of the present disclosure devised by the present inventors will be described in detail.
Embodiment 2
< 2.1 information processing method >)
First, a description will be given of a general flow of processing for acquiring feature points and feature amounts from the reflection intensity image 400 and the visible light image 500 obtained from different sensors and generating a model (algorithm) for matching the common feature points. Although the description is made here of the case where the information processing apparatus 300 generates a model, the present embodiment may be performed by an information processing apparatus (not shown) different from the information processing apparatus 300, and is not particularly limited.
An information processing method according to an embodiment of the present disclosure and a processing method for generating a model will be described with reference to fig. 6. Fig. 6 is a flowchart illustrating an example of the information processing method according to the present embodiment. In detail, as shown in fig. 6, the information processing method of the present embodiment may mainly include a plurality of steps from step S500 to step S900. The following describes each of these steps in this embodiment in detail.
First, the information processing apparatus 300 collects one or more visible light images 500 from the camera 200 (step S500). Next, the information processing apparatus 300 collects one or more reflected intensity images 400 from the LiDAR (step S600).
Then, the information processing apparatus 300 generates a pair serving as input data using the visible light image 500 and the reflection intensity image 400 at the same viewpoint collected in step S500 and step S600 described above (step S700).
Next, the information processing apparatus 300 generates a forward-resolution label (teacher data) common to the visible light image 500 and the reflection intensity image 400 (step S800).
Then, the information processing apparatus 300 performs machine learning while randomly projecting the visible light image 500 and the reflection intensity image 400 (step S900).
The following describes in order the steps of generating the input data from step S700 to step S900, generating the forward-solving label, and learning.
< 2.2 Generation of input data >
The generation of input data according to the present embodiment will be described in detail with reference to fig. 7. Fig. 7 is an explanatory diagram illustrating an example of input data of the present embodiment. In the present embodiment, as described above, in step S700, a pair of the reflected intensity image 404 and the visible light image 504, which are input data, is generated. At this time, in the present embodiment, as shown in fig. 7, from the LiDAR100 and the camera 200, a reflected intensity panoramic image (first wide area image) 402 and a visible light panoramic image (second wide area image) 502, which are images of a wide area, are used.
In detail, the information processing apparatus 300 cuts out images from each of the reflection intensity panoramic image 402 and the visible light panoramic image 502 to be at the same position (the same viewpoint), the same size, and the same view angle (FOV). In this case, the information processing apparatus 300 may correct optical distortion or the like in the image. In this way, the information processing apparatus 300 can generate the input data 704 composed of the pair of the reflection intensity image 404 and the visible light image 504. According to the present embodiment, by generating input data by clipping from a panoramic image, a large number of pairs of the reflection intensity image 404 and the visible light image 504 with less offset can be easily generated.
There are cases where noise (in the figure, an image of a vehicle) caused by a moving subject exists in the reflection intensity panoramic image 402 and the visible light panoramic image 502, noise or the like which is a position lacking in matching caused by a difference in acquisition time between the reflection intensity panoramic image 402 and the visible light panoramic image 502. Therefore, in the present embodiment, in order not to take such noise as an object of machine learning, a mask image 602 including a mask covering noise portions of the reflected intensity panoramic image 402 and the visible light panoramic image 502 is generated. Then, in the present embodiment, the mask image 604 paired with the reflection intensity image 404 and the visible light image 504 included in the input data 704 is generated by cutting out the image from the generated mask image 602 to be the same position (the same viewpoint), the same size, and the same view angle (FOV). According to the present embodiment, by using such a mask, a position lacking in matching property is excluded from the object of machine learning, and thus the accuracy and efficiency of machine learning can be further improved.
Next, a detailed configuration of the information processing apparatus 300 according to the present embodiment will be described with reference to fig. 8 and 9. Fig. 8 is a block diagram showing an example of the configuration of the information processing apparatus 300 according to the present embodiment, and fig. 9 is an explanatory diagram showing an example of the generation of a mask according to the present embodiment. Here, the description will be focused on the function of the information processing apparatus 300 concerning the generation stage of the input data in the generation of the model. In detail, as shown in fig. 8, the information processing apparatus 300 mainly includes a reflected intensity image acquisition unit 322, a visible light image acquisition unit 324, a reflected intensity image processing unit (image processing unit) 326, a visible light image processing unit (image processing unit) 328, a mask generation unit (mask unit) 330, and an input data generation unit 332. The following describes each functional unit of the information processing apparatus 300 in detail.
(reflection intensity image acquiring section 322)
The reflected intensity image acquisition unit 322 acquires data of a reflected intensity panoramic image (first wide area image) 402 from the LiDAR100, and outputs the acquired data to a reflected intensity image processing unit 326 and a mask generation unit 330, which will be described later.
(visible light image acquisition section 324)
The visible light image acquisition unit 324 acquires data of a visible light panoramic image (second wide area image) 502 from the camera 200, and outputs the acquired data to a visible light image processing unit 328 and a mask generation unit 330, which will be described later.
(reflection intensity image processing section 326)
The reflected intensity image processing unit 326 cuts the reflected intensity panoramic image 402 from the reflected intensity image acquisition unit 322 into a predetermined position (viewpoint), a predetermined size, and a predetermined view angle (FOV), and generates a reflected intensity image 404 as input data 704. The reflected intensity image processing unit 326 outputs the generated reflected intensity image 404 to an input data generating unit 332 described later. The reflection intensity image processing unit 326 may perform optical distortion, brightness adjustment (gain adjustment), and contrast adjustment (gamma adjustment) in the image.
(visible light image processing section 328)
The visible light image processing unit 328 cuts the visible light panoramic image 502 from the visible light image acquisition unit 324 into a predetermined position (viewpoint), a predetermined size, and a predetermined view angle (FOV), and generates a visible light image 504 as input data 704. The visible light image processing unit 328 outputs the generated visible light image 504 to an input data generating unit 332 described below. The visible light image processing unit 328 may perform optical distortion, brightness adjustment (gain adjustment), and contrast adjustment (gamma adjustment) in the image.
(mask generating section 330)
In the present embodiment, the mask image 602 is automatically generated by a convolutional neural network (Convolutional Neural Network:cnn). As described above, according to the present embodiment, the mask images 602 and 604 can be easily and largely generated, and further the input data 704 can be generated. Specifically, as shown in fig. 9, the mask generating unit 330 is configured by CNN or the like, and generates a mask image 602 using the aligned (aligned) reflection intensity panoramic image 402 and the visible light panoramic image 502 as input data. The mask generating unit 330 then cuts the generated mask image 602 into a predetermined position (viewpoint), a predetermined size, and a predetermined view angle (FOV), generates a mask image 604 serving as input data 704, and outputs the mask image 604 to the input data generating unit 332 described later. For example, the CNN330 can capture a subject as a block of a BOX, represent the subject with the position coordinates of the center point of the BOX and the image feature amount thereof, and generate the mask image 602 by using a subject detection algorithm such as "Objects as Points" for subject recognition. In this way, in the present embodiment, a mask for excluding a position lacking matching from the machine-learned object can be automatically generated.
(input data generating section 332)
The input data generation unit 332 outputs the reflection intensity image 404, the visible light image 504, and the mask image 604, which are output from the reflection intensity image processing unit 326, the visible light image processing unit 328, and the mask generation unit 330 described above, at the same position (the same viewpoint), at the same size, and at the same viewing angle (FOV), as a set (pair) of input data 704 to a functional unit (specifically, the reflection intensity image acquisition units 342 and 362 and the visible light image acquisition units 344 and 364 shown in fig. 10 and 13) described below. In the present embodiment, when there is no noise in the reflection intensity image 404 and the visible light image 504, the mask image 604 may not be included in the group of input data.
In the present embodiment, the functional blocks of the information processing apparatus 300 related to the generation stage of the input data 704 during the generation of the model are not limited to the configuration shown in fig. 8.
< 2.3 Generation of forward solution tags >)
Next, the generation of the forward-solution tag (teacher data) of the present embodiment will be described in detail. There are tens to hundreds of feature points in one image. Therefore, when generating a positive solution tag for machine learning, it is not realistic to manually detect feature points that become positive solution tags one by one. Therefore, in the present embodiment, the forward label is automatically generated using DNN or the like.
First, a detailed configuration of the information processing apparatus 300 according to the present embodiment will be described with reference to fig. 10. Fig. 10 is a block diagram showing an example of the configuration of an information processing apparatus 300 according to the present embodiment. Here, the description will be focused on the function of the information processing apparatus 300 regarding the generation stage of the forward solution tag (teacher data) in the generation of the model. Specifically, as shown in fig. 10, the information processing apparatus 300 mainly includes a reflected intensity image acquisition unit 342, a visible light image acquisition unit 344, a reflected intensity image projection unit 346, a visible light image projection unit 348, and a positive solution tag generation unit (teacher data generation unit) 350. The following describes each functional unit of the information processing apparatus 300 in detail.
(reflection intensity image acquiring section 342)
The reflected intensity image acquisition unit 342 acquires the reflected intensity image 404 and the mask image 604 from the input data generation unit 332 of fig. 8, and outputs the images to a reflected intensity image projection unit 346 described later. In the present embodiment, the reflection intensity image acquisition unit 342 may not acquire and output the mask image 604 when the reflection intensity image 404 does not have noise.
(visible light image acquisition section 344)
The visible light image acquisition unit 344 acquires the visible light image 504 and the mask image 604 from the input data generation unit 332 of fig. 8, and outputs the images to a visible light image projection unit 358 described later. In the present embodiment, the visible light image acquisition unit 344 may not acquire and output the mask image 604 when the visible light image 504 is free from noise.
(reflection intensity image projection section 346)
The reflected intensity image projection unit 346 projects the acquired reflected intensity image 404 (mask image 604, if necessary) by randomly rotating or shifting the viewpoint horizontally, vertically, or obliquely. For example, the reflection intensity image projection unit 346 can perform projection by a homography matrix H given randomly. Then, the reflected intensity image projection unit 346 outputs the projected reflected intensity image (first projected image) obtained by the projection to the forward-solution tag generation unit 350 described later together with the reflected intensity image 404.
(visible light image projecting section 348)
The visible light image projection unit 348 projects the acquired visible light image 504 (mask image 604, if necessary) by randomly rotating the image or shifting the viewpoint horizontally, vertically, or obliquely. For example, the visible light image projection unit 348 can project the visible light image by using a homography matrix H which is randomly given. Then, the visible light image projection unit 348 outputs the projected visible light image (second projected image) obtained by the projection to the positive solution tag generation unit 350 described later together with the visible light image 504.
(Positive solution Label Generation section 350)
The positive solution tag generation unit 350 generates a positive solution tag (teacher data) to be used in a learning unit 370 (see fig. 13) described later. For example, the forward label generating unit 350 detects feature points of the reflected intensity image 404 and the visible light image 504 using the projected reflected intensity image 404 and the projected visible light image 504, and acquires likelihood maps (maps obtained by plotting accuracy of each feature point and the feature point) of each feature point. Then, the forward label generation section 350 generates a forward label for the reflection intensity image and a forward label for the visible light image by combining the likelihood maps. In the present embodiment, the positive-solution-label generating unit 350 is configured by, for example, an encoder (not shown) that performs dimensional compression on input data and a detector (not shown) that detects feature points.
In the present embodiment, the functional blocks of the information processing apparatus 300 related to the generation stage of the forward-solution tag in the generation of the model are not limited to the configuration shown in fig. 10.
Next, the generation of the positive solution label according to the present embodiment will be described in detail with reference to fig. 11 and 12. Fig. 11 and 12 are explanatory diagrams illustrating an example of generation of a positive solution label according to the present embodiment.
In the present embodiment, as shown in fig. 11, the forward-solution tag generation unit 350 performs machine learning in advance using CG (Computer Graphics: computer graphics) images 700 prepared in advance, and generates a forward-solution tag 800. Then, the information processing apparatus 300 compares the generated forward-solved label 800 with the forward-solved label 900 including the feature points of the CG image 700 manually generated in advance, and feeds back the difference (detector loss) thereof to the forward-solved label generating section 350, and performs reinforcement learning so that the difference becomes smaller.
However, in the algorithm (model) thus obtained (the forward label generating section 350), there is a space between the CG image 700 and an actual image (a reflected intensity image, a visible light image) actually used, so it is difficult to robustly detect a feature point (for example, a feature point to be detected cannot be detected) for an image in which the viewing direction (viewpoint position) is changed. Therefore, in the present embodiment, the forward label generating unit 350 can detect the feature points in a robust manner by applying random projections to the reflection intensity images and the visible light images and performing machine learning using the projected images. In detail, in the present embodiment, the forward label generating unit 350 also detects feature points from the projected images by applying random projections to the reflection intensity images and the visible light images, thereby obtaining the probability (possibility) of detecting the feature points. Next, in the present embodiment, a likelihood map obtained by diagramming the possibilities of the respective feature points of the reflection intensity image and the visible light image is combined to generate a forward-solved label common to the reflection intensity image and the visible light image. In the present embodiment, a model (algorithm) capable of stably detecting feature points from both the reflection intensity image and the visible light image can be obtained by using the forward-solution tag common to the reflection intensity image and the visible light image in a learning stage described later.
More specifically, as shown in fig. 12, the forward label generating unit 350, which performs machine learning based on the CG image 700, generates a likelihood map 802 including feature points and the likelihood of the feature points, based on the reflection intensity image 406 and the projected reflection intensity image 410. Next, the forward label generating unit 350 generates a likelihood map 802 including feature points and the likelihood of the feature points, based on the visible light image 506 and the projected visible light image 510. The forward-solution tag generation unit 350 generates a forward-solution tag 904 for the reflection intensity image and a forward-solution tag 904 for the visible light image by combining the two likelihood maps. In addition, the positive solution tag generation unit 350 repeats the above-described machine learning by using the generated positive solution tag 904, thereby obtaining a final positive solution tag 904. The generation of the forward-solved label 904 according to the present embodiment is similar to the technique described in the above-described non-patent document 1, but is different in that the forward-solved label 904 can be generated so that a characteristic point common to a reflection intensity image and a visible light image obtained from different sensors (different fields) can be detected with robustness even if the observation direction (viewpoint) is changed.
< 2.4 learning >)
Next, the generation of the learning-based model according to the present embodiment will be described in detail. Here, a model (algorithm) for robustly performing matching of common feature points or feature points in a reflection intensity image and a visible light image obtained from different sensors (different fields) even if the observation direction (viewpoint) changes is generated by machine learning.
First, a detailed configuration of the information processing apparatus 300 according to the present embodiment will be described with reference to fig. 13 and 14. Fig. 13 is a block diagram showing an example of the configuration of an information processing apparatus 300 according to the embodiment of the present disclosure, and fig. 14 is a block diagram showing an example of the configuration of a learning unit 370 according to the embodiment. Here, the description will be focused on the function of the information processing apparatus 300 for generating a model by learning. In detail, as shown in fig. 13, the information processing apparatus 300 mainly includes a reflected-light-intensity image acquisition unit 362, a visible-light-image acquisition unit 364, a reflected-light-intensity image projection unit 366, a visible-light-image projection unit 368, and a learning unit (learner) 370. The following describes each functional unit of the information processing apparatus 300 in detail.
(reflected intensity image acquisition section 362)
The reflected intensity image acquisition unit 362 acquires the reflected intensity image 404 and the mask image 604 from the input data generation unit 332 of fig. 8, and outputs the images to a reflected intensity image projection unit 366 described later. In the present embodiment, the reflection intensity image acquisition unit 362 may not acquire and output the mask image 604 when the reflection intensity image 404 does not have noise.
(visible light image acquisition section 364)
The visible light image acquisition unit 364 acquires the visible light image 504 and the mask image 604 from the input data generation unit 332 of fig. 8, and outputs the images to a visible light image projection unit 368 described later. In the present embodiment, the visible light image acquisition unit 364 may not acquire and output the mask image 604 when the visible light image 504 is free of noise.
(reflection intensity image projection unit 366)
The reflected intensity image projection unit 366 projects the acquired reflected intensity image 404 (mask image 604 if necessary). For example, the reflection intensity image projection unit 366 can project by a homography matrix H given at random. Then, the reflected intensity image projection unit 366 outputs the projected reflected intensity image (first projected image) 410 obtained by the projection to a learning unit 370 described later together with the reflected intensity image 404.
(visible light image projecting section 368)
The visible light image projection unit 368 projects the acquired visible light image 504 (mask image 604 if necessary). For example, the visible light image projection unit 368 can project a visible light image by using a homography matrix H given at random. Then, the visible light image projection unit 368 outputs the projected visible light image (second projection image) 510 obtained by the projection to the learning unit 370 described later together with the visible light image 504.
(learning section 370)
The learning unit 370 acquires feature points and feature amounts from the reflected intensity image 404 and the visible light image 504, and generates a model (algorithm) for matching the common feature points. In detail, for example, the pair of the reflected intensity image 404 and the input data 704 of the projected visible light image 510 and/or the pair of the visible light image 504 and the input data 704 of the projected reflected intensity image 410 are input to the learning unit 370. Alternatively, for example, the pair of the input data 704 of the visible light image 504 and the projected visible light image 510 and the pair of the input data 704 of the visible light image 504 and the projected reflection intensity image 410 may be input to the learning unit 370. Alternatively, for example, the pair of the input data 704 of the reflected intensity image 404 and the projected reflected intensity image 410 and the pair of the input data 704 of the reflected intensity image 404 and the projected visible light image 510 may be input to the learning unit 370. The learning unit 370 may be inputted with a pair of the input data 704 of the visible light image 504 and the projected reflection intensity image 410, a pair of the input data of the reflection intensity image 404 and the projected visible light image 510, a pair of the input data 704 of the visible light image 504 and the projected visible light image 510, and a pair of the input data 704 of the reflection intensity image 404 and the projected reflection intensity image 410. That is, in the present embodiment, a pair including input data of two images from sensors of different types is input. The learning unit 370 can generate a model for performing machine learning using the input data, so as to perform matching of common feature points in the reflection intensity image and the visible light image obtained from the different types of sensors even if the observation direction changes.
More specifically, as shown in fig. 14, the learning unit 370 includes an encoder unit 372 that performs dimensional compression (for example, 1/8) on a pair of input data 704, a detector unit 374 that detects feature points (positions and coordinate information thereof that show features such as the shape of the subject, such as a center point, a branch point, an intersection point, and an end point on a contour in an image) from the pair of compressed input data 704, and a descriptor unit (feature amount acquisition unit) 376 that acquires (describes) feature amounts (can numerically show features of the feature points, such as information such as the shape, direction, and expansion of the feature points) from the pair of compressed input data 704. The learning unit 370 performs machine learning by matching common feature points of images from different sensors based on the feature amounts, or comparing feature points and feature amounts acquired from the respective images with the forward-solution tag (teacher data) 804, and feeding back the comparison result to the learning unit 370.
In the present embodiment, the functional blocks of the information processing apparatus 300 related to the generation stage of the model are not limited to the configuration shown in fig. 13 and 14.
Example 1
Next, an embodiment of specific machine learning by the learning unit 370 will be described with reference to fig. 15. Fig. 15 is an explanatory diagram illustrating an example of learning of the present embodiment.
In this example, for example, the learning unit 370 is input with a pair of the reflected intensity image 404 and the input data 704 of the projected visible light image 510 and/or a pair of the visible light image 504 and the input data 704 of the projected reflected intensity image 410. Alternatively, for example, the pair of the input data 704 of the visible light image 504 and the projected visible light image 510 and the pair of the input data 704 of the visible light image 504 and the projected reflection intensity image 410 may be input to the learning unit 370. Alternatively, for example, the pair of the input data 704 of the reflected intensity image 404 and the projected reflected intensity image 410 and the pair of the input data 704 of the reflected intensity image 404 and the projected visible light image 510 may be input to the learning unit 370. The learning unit 370 may be inputted with a pair of the input data 704 of the visible light image 504 and the projected reflection intensity image 410, a pair of the input data of the reflection intensity image 404 and the projected visible light image 510, a pair of the input data 704 of the visible light image 504 and the projected visible light image 510, and a pair of the input data 704 of the reflection intensity image 404 and the projected reflection intensity image 410.
More specifically, in the example shown in fig. 15, a pair of input data 710a of the reflected intensity image 406 and the projected visible light image 510, and a pair of input data 710b of the visible light image 506 and the projected reflected intensity image 410 are input.
In the present embodiment, the learning unit 370 including two sets of the encoder unit 372, the detector unit 374, and the descriptor unit 376, which are identical in weight, is prepared, and the feature points and the feature amounts are acquired from the pair of input data 710a and 710 b. Specifically, the learning unit 370 compares the result data 810a and 810b including the feature points acquired by the detector units 374 with the above-described forward label 804, and calculates a loss (detector loss) L, which is a difference between the result data and the forward labelp . In the present embodiment, each learning unit 370 generates a result based on the feature value obtained from each descriptor unit 376Data 812, matching and comparing feature points, and calculating a difference between the feature points, i.e., a loss (descriptor loss) Ld
Here, when the likelihood maps of the feature points from the paired images and the projected image are set to χ and χ 'and the feature amounts from the images and the projected image are set to D, D', the final loss value L can be shown by the following equation (1) using the constant λ.
[ formula 1]
L(χ,χ′,D,D′,Y,Y′,s)=Lp (χ,Y)+Lp (χ′,Y′)+λLd (D,D′,s)…(1)
In formula (1), Y represents the forward label 804 of the feature point, and s represents the correspondence of pixels between the two images.
In addition, the loss (detector loss) L of the detector portion 374 is calculated by cross entropy with the forward solution label 804p This can be shown by the following formula (2). The projection image is an image projected by the homography matrix H given at random.
[ formula 2]
In addition, the hinge loss can be used to input the feature quantity d of each pixel of the imagehw (dhw Is the source of set D), and the feature quantity D 'of the projected image of the input image'hw (d′hw Is the source of set D'), the loss (descriptor loss) L of the descriptor section 376 is shown by the following equation (3)d . In addition, in the formula (3), the positive edge is set to mp Let the negative edge be mn ,λd A constant that balances the positive correspondence and the false correspondence is obtained. At this time, the correspondence relation (match) s is defined by the following equation (4).
[ formula 3]
ld (d,d′;s)=λd *s*max(0,mp -dT d′)+(1-s)*max(0,dT d′-mn ) …(3)
[ equation 4]
Furthermore, phw Is the pixel position on the image corresponding to the feature quantity of the descriptor section 376. In addition, Hphw Is the pixel location that is warped by the homography matrix H. Further, since the feature amount of the descriptor portion 376 is compressed to 1/8 with respect to the input image, it is regarded as a pixel corresponding to a case where the distance of the corresponding pixel is within 8 pix.
As described above, the learning unit 370 calculates the final loss L and performs feedback so as to minimize L, thereby generating a model (algorithm) that can stably perform matching of the reflection intensity image obtained from the different sensor (different field) and the common feature point or feature point in the visible light image even when the observation direction (viewpoint) changes.
Example 2
Further, another embodiment of specific machine learning by the learning unit 370 will be described with reference to fig. 16. Fig. 16 is an explanatory diagram illustrating an example of learning of the present embodiment.
In the present embodiment, a Shared Encoder (E) shown in fig. 17s ) The same function as the encoder 372 described above is provided. In this embodiment, a Private Encoder (Ep) for an image from a reflection intensity image and a Private Encoder (Ep) for an image from a visible light image are prepared (the first Encoder section). In this embodiment, a Shared Decoder (Shared Decoder) is prepared in which the output obtained by adding the Shared Encoder and the output of the Private Encoder is used as input data.
More specifically, in the example shown in fig. 16, a pair of the input data 712a of the reflected intensity image 406 and the projected visible light image 510, and a pair of the input data 712b of the visible light image 506 and the projected reflected intensity image 410 are input.
The value L of the final loss in this embodiment is calculated from five loss functions (Lp 、Ld 、Lr 、Lf 、Ls ) Is a total of the above components. L in five loss functionsp 、Ld The same as in example 1 described above. In addition, loss Lr Is for outputting (Es (I) With the output of the Private Encoder (E)p (I) The output image I and the input image are compared, and the same Reconstruction loss (Reconstruction loss) is obtained. In addition, loss Lf Is to make the output of the Private Encoder (Ep (I) With the output of the Shared Encoder (E)s (I) Different Difference losses (Difference losses). And, loss Ls The Similarity loss (Similarity loss) is a factor that makes the output of the Shared Encoder ambiguous as to which of the visible light image and the reflection intensity image is output.
Also, five loss functions (Lp 、Ld 、Lr 、Lf 、Ls ) And the constants α, β, γ define the final loss value L as in the following equation (5).
[ equation 5]
L=Lp +λLd +αLr +βLf +γLs …(5)
Reconstruction loss L is defined by the following equation (6)r So that the output of the Shared Decoder coincides with the input image.
[ formula 6]
In the formula (6), k is the number of pixels of the input image, 1k Is a vector of element 1 and length k. In the formula (6), the position expressed by the norm refers to the square value of the L2 norm.
Is determined by the following formula (7)Sense Difference loss Lf So that the output of the Private Encoder (Ep (I) With the output of the Shared Encoder (E)s (I) A) are different.
[ formula 7]
In the formula (7), the position expressed by the norm refers to a square value of the Frobenius norm (freuden Luo Beini us norm).
Similarity loss Ls Is a loss for learning so that the output of the Shared Encoder is ambiguous as to which of the visible light image and the reflection intensity image is output. In this embodiment, gradient Reversal Layer (gradient inversion layer, GRL) is used in order to maximize the above mixing. The GRL outputs the same as a function, but the gradient direction is reversed. Therefore, GRL is defined by the following formula (8).
[ formula 8]
The output E p (I) of the Shared Encoder is input to the domain identifier Z (Q (Ep (I);θz ) D (caret is added to d) and distinguishes between a visible light image and a reflected intensity image. At this time, θz Is a parameter of the domain identifier Z, and d (with a caret appended to d) is the source of the set {0,1 }. During learning, through GRL, θz In contrast to the enhancement of the discrimination capability of the domain identifier Z, the parameter of Shared encoding is learned by reversing the gradient so that the discrimination capability of the domain identifier is reduced. Thus, the Similarity loss Ls is defined by the following formula (9).
[ formula 9]
As described above, the learning unit 370 calculates the final loss L and performs feedback so as to minimize L, thereby generating a model (algorithm) that can stably perform matching of the reflection intensity image obtained from the different sensor (different field) and the common feature point or feature point in the visible light image even when the observation direction (viewpoint) changes.
Summary of 3
As described above, according to the embodiments of the present disclosure, feature point matching between images obtained from sensors (fields) of different types can be performed with good accuracy. As a result, according to the present embodiment, the information of the plurality of sensors can be accurately and easily aligned. Specifically, according to the present embodiment, the external parameters (positional parameters) and/or internal parameters (optical parameters) of the LiDAR100 and the camera 200 can be accurately corrected based on the difference between the positions where the LiDAR100 and the camera 200 are arranged (parallax, distance to the subject), the difference in the angle of view between the LiDAR100 and the camera 200, or lens aberrations, so that the difference (deviation) in the positional information in the images output from the LiDAR100 and the camera 200 can be eliminated. The matching of feature points based on the model (algorithm) obtained in the present embodiment is not limited to the calibration (alignment) using a plurality of sensors, and can be applied to distortion (a technique of newly generating an image between two images that are continuous in time series by a computer pattern) and the like.
The present embodiment is not limited to the combination of the LiDAR100 and the camera 200, and may be applied to, for example, a combination of other image sensors that observe a light-generated image of a specific wavelength. That is, the present embodiment can be applied to any sensor of a different type without any particular limitation.
Hardware architecture 4
For example, the information processing apparatus 300 according to each of the above embodiments may be realized by a computer 1000 having a configuration as shown in fig. 17, which is connected to the LiDAR100 and the camera 200 via a network. Fig. 17 is a hardware configuration diagram showing an example of a computer that realizes the functions of the information processing apparatus 300. The computer 1000 has a CPU1100, a RAM1200, a ROM (Read Only Memory) 1300, an HDD (Hard Disk Drive) 1400, a communication interface 1500, and an input/output interface 1600. The various components of computer 1000 are connected by a bus 1050.
The CPU1100 operates based on a program stored in the ROM1300 or the HDD1400, and controls each unit. For example, the CPU1100 expands programs stored in the ROM1300 or the HDD1400 in the RAM1200, and executes processing corresponding to various programs.
ROM1300 stores a boot program such as BIOS (Basic Input Output System: basic input output System) executed by CPU1100 at the time of starting up computer 1000, a program depending on the hardware of computer 1000, and the like.
The HDD1400 is a computer-readable recording medium that temporarily stores a program executed by the CPU1100, data used by the program, and the like. Specifically, HDD1400 is a recording medium that records a ranging program of the present disclosure as one example of program data 1450.
The communication interface 1500 is an interface for connecting the computer 1000 to an external network 1550 (for example, the internet). For example, the CPU1100 receives data from other devices or transmits data generated by the CPU1100 to other devices via the communication interface 1500.
The input/output interface 1600 is an interface for connecting the input/output device 1650 to the computer 1000. For example, the CPU1100 receives data from an input device such as a keyboard, a mouse, or the like via the input-output interface 1600. The CPU1100 also transmits data to an output device such as a display, a speaker, and a printer via the input/output interface 1600. The input/output interface 1600 may function as a medium interface for reading a program or the like recorded on a predetermined recording medium (medium). Examples of the medium include an Optical recording medium such as a DVD (Digital Versatile Disc: digital versatile disc), a PD (Phase change rewritable Disk: phase change erasable Optical disc), an Magneto-Optical recording medium such as an MO (Magneto-Optical disc), a magnetic tape medium, a magnetic recording medium, and a semiconductor memory.
For example, in the case where the computer 1000 functions as the information processing apparatus 300 according to the embodiment of the present disclosure, the CPU1100 of the computer 1000 realizes the functions of the learning unit 370 and the like by executing programs and models loaded on the RAM 1200. In addition, a program or the like of the embodiment of the present disclosure is stored in the HDD 1400. Further, although the CPU1100 reads the program data 1450 from the HDD1400 and executes the program data, as another example, the program may be acquired from another device via the external network 1550.
The information processing apparatus 300 according to the present embodiment is applicable to a system including a plurality of apparatuses on the premise of connection to a network (or communication between the apparatuses), such as cloud computing.
Application example 5
An example of a mobile device control system to which the techniques set forth in this disclosure can be applied is described with reference to fig. 18. Fig. 18 is a block diagram showing a configuration example of a vehicle control system 11 as an example of a mobile device control system to which the present technology is applied.
The vehicle control system 11 is provided in the vehicle 1, and performs processing related to driving assistance and automatic driving of the vehicle 1.
The vehicle control system 11 includes a vehicle control ECU (Electronic Control Unit: electronic control unit) 21, a communication unit 22, a map information storage unit 23, a position information acquisition unit 24, an external recognition sensor 25, an in-vehicle sensor 26, a vehicle sensor 27, a storage unit 28, a travel assist/autopilot control unit 29, a DMS (Driver Monitoring System: driver monitoring system) 30, an HMI (Human Machine Interface: human-machine interface) 31, and a vehicle control unit 32.
The vehicle control ECU21, the communication unit 22, the map information storage unit 23, the positional information acquisition unit 24, the external recognition sensor 25, the in-vehicle sensor 26, the vehicle sensor 27, the storage unit 28, the travel assist/autopilot control unit 29, the Driver Monitoring System (DMS) 30, the human-machine interface (HMI) 31, and the vehicle control unit 32 are communicably connected to each other via the communication network 41. The communication network 41 is constituted by, for example, an in-vehicle communication network or a bus in accordance with a digital bidirectional communication standard such as CAN (Controller Area Network: controller area network), LIN (Local Interconnect Network: local area network), LAN (Local Area Network: local area network), flexRay (registered trademark), ethernet (registered trademark), or the like. The communication network 41 may be used separately according to the kind of data to be transmitted. For example, CAN may be applied to data related to vehicle control, and ethernet may be applied to large-capacity data. The respective units of the vehicle control system 11 may be directly connected not via the communication network 41, but using wireless communication that assumes relatively close-range communication such as near field communication (NFC (Near Field Communication: near field communication)) or Bluetooth (registered trademark), for example.
Note that, in the following, when the respective units of the vehicle control system 11 communicate via the communication network 41, description of the communication network 41 is omitted. For example, in the case where the vehicle control ECU21 and the communication unit 22 communicate via the communication network 41, it is described that only the vehicle control ECU21 communicates with the communication unit 22.
The vehicle control ECU21 is configured by various processors such as a CPU (Central Processing Unit: central processing unit) and an MPU (Micro Processing Unit: microprocessor), for example. The vehicle control ECU21 can control the functions of the entire or a part of the vehicle control system 11.
The communication unit 22 can communicate with various devices inside and outside the vehicle, other vehicles, servers, base stations, and the like, and can transmit and receive various data. In this case, the communication unit 22 may perform communication using a plurality of communication methods.
Here, communication with the outside of the vehicle that can be performed by the communication unit 22 will be schematically described. The communication unit 22 can communicate with a server existing on an external network (hereinafter referred to as an external server) or the like via a base station or an access point by a wireless communication system such as 5G (fifth generation mobile communication system), LTE (Long Term Evolution: long term evolution), DSRC (Dedicated Short Range Communications: dedicated short range communication), or the like. The external network through which the communication unit 22 communicates is, for example, the internet, a cloud network, or a network unique to an enterprise. The communication method by the communication unit 22 for the external network is not particularly limited as long as it is a wireless communication method capable of performing digital bidirectional communication at a predetermined or higher communication speed and at a predetermined or higher distance.
The communication unit 22 can communicate with a terminal existing in the vicinity of the host vehicle using, for example, P2P (Peer To Peer) technology. Terminals existing in the vicinity of the host vehicle include, for example, terminals to which a mobile body such as a pedestrian or a bicycle is attached, terminals provided in a stationary manner in a store or the like, or MTC (Machine Type Communication: machine type communication) terminals. The communication unit 22 can also perform V2X communication. The V2X communication is, for example, communication between the host Vehicle and other vehicles (Vehicle to Vehicle), communication between the host Vehicle and road-side devices (Vehicle to Infrastructure), communication between the host Vehicle and Home (Vehicle to Home), and communication between the host Vehicle and terminals held by pedestrians (Vehicle to Pedestrian), and other communication between the host Vehicle and other vehicles.
The communication unit 22 can receive, for example, a program (Over The Air) for updating software for controlling The operation of The vehicle control system 11 from The outside. The communication unit 22 can receive map information, traffic information, information on the surroundings of the vehicle 1, and the like from the outside. In addition, for example, the communication unit 22 can transmit information related to the vehicle 1, information on the surroundings of the vehicle 1, and the like to the outside. As the information related to the vehicle 1 transmitted to the outside by the communication unit 22, for example, data indicating the state of the vehicle 1, the recognition result by the recognition unit 73, and the like can be cited. The communication unit 22 can also perform communication corresponding to a vehicle emergency notification system such as an E-call (E-call), for example.
For example, the communication unit 22 may receive electromagnetic waves transmitted by a road traffic information communication system (VICS (Vehicle Information and Communication System) (registered trademark)) such as a radio beacon, an optical beacon, and FM multiplex broadcasting.
Communication with the inside of the vehicle that can be executed by the communication unit 22 will be schematically described. The communication unit 22 can communicate with each device in the vehicle using wireless communication, for example. The communication unit 22 can perform Wireless communication with devices in the vehicle by a communication system capable of performing digital two-way communication at a predetermined communication speed or more by Wireless communication, such as Wireless LAN, bluetooth (registered trademark), NFC, WUSB (Wireless USB), and the like. The communication unit 22 is not limited to this, and may communicate with each device in the vehicle using wired communication. For example, the communication unit 22 can communicate with each device in the vehicle by wired communication via a cable connected to a connection terminal, not shown. The communication unit 22 can communicate with each device in the vehicle by a communication system such as USB (Universal Serial Bus: universal serial bus), HDMI (High-Definition Multimedia Interface: high-definition multimedia interface) (registered trademark), MHL (Mobile High-definition Link) or the like, which can perform digital bidirectional communication at a predetermined communication speed or more by wired communication.
Here, the in-vehicle device refers to, for example, a device in the vehicle that is not connected to the communication network 41. As the devices in the vehicle, for example, a mobile device held by a passenger such as a driver, a wearable device, an information device brought into the vehicle and temporarily set, and the like are assumed.
The map information storage unit 23 can store one or both of a map acquired from the outside and a map created in the vehicle 1. For example, the map information storage unit 23 stores a three-dimensional high-precision map, a global map having a lower precision than the high-precision map and covering a wider area, and the like.
Examples of the high-precision map include a dynamic map, a point cloud map, and a vector map. The dynamic map is a map composed of four layers of dynamic information, quasi-static information, and is provided to the vehicle 1 from an external server or the like. The point cloud map is a map composed of point clouds (point group data). The vector map is illustrated by associating traffic information such as the positions of lanes and traffic lights with a point cloud map, and is suitable for an ADAS (Advanced Driver Assistance System: advanced driving support system) and AD (Autonomous Driving: automatic driving) map.
The point cloud map and the vector map may be provided from an external server or the like, or may be created in the vehicle 1 as a map for matching with a local map described later based on the sensing results of the camera 51, the radar 52, the LiDAR53, or the like, and stored in the map information storage unit 23. In addition, in the case of providing a high-precision map from an external server or the like, map data of, for example, several hundred meters square, relating to a planned route along which the vehicle 1 is traveling after that, is acquired from the external server or the like in order to reduce the communication capacity.
The position information acquiring unit 24 can receive GNSS signals from GNSS (Global Navigation Satellite System: global navigation satellite system) satellites and acquire position information of the vehicle 1. The acquired position information is supplied to the travel support/automatic driving control unit 29. The position information acquiring unit 24 is not limited to the type using GNSS signals, and may acquire position information using beacons, for example.
The external recognition sensor 25 includes various sensors used for recognizing the external condition of the vehicle 1, and can supply sensor data from the respective sensors to the respective sections of the vehicle control system 11. The kind and number of sensors included in the external recognition sensor 25 are not particularly limited.
For example, the external recognition sensor 25 has a camera 51, a radar 52, liDAR (Light Detection and Ranging, laser Imaging Detection and Ranging: light detection and ranging, laser imaging detection and ranging) 53, and an ultrasonic sensor 54. The external recognition sensor 25 is not limited to this, and may be configured to have one or more of the camera 51, the radar 52, the LiDAR53, and the ultrasonic sensor 54. The number of cameras 51, radar 52, liDAR53, and ultrasonic sensor 54 is not particularly limited as long as they can be installed in the vehicle 1. The type of the sensor provided in the external recognition sensor 25 is not limited to this example, and the external recognition sensor 25 may have another type of sensor. An example of the sensing area of each sensor included in the external recognition sensor 25 will be described later.
The imaging mode of the camera 51 is not particularly limited. For example, as necessary, cameras of various photographing modes such as a ToF (Time of flight) camera, a stereoscopic camera, a monocular camera, and an infrared camera, which are photographing modes capable of ranging, can be applied to the camera 51. The camera 51 is not limited to this, and may be used to acquire only a photographed image regardless of distance measurement.
In addition, the external recognition sensor 25 can have an environment sensor for detecting the environment for the vehicle 1, for example. The environmental sensor is a sensor for detecting the environment such as weather, and brightness, and may include, for example, a raindrop sensor, a fog sensor, a sun sensor, a snow sensor, and an illuminance sensor.
The external recognition sensor 25 includes, for example, a microphone for detecting the sound around the vehicle 1, the position of the sound source, and the like.
The in-vehicle sensor 26 has various sensors for detecting information in the vehicle, and can supply sensor data from the respective sensors to the respective sections of the vehicle control system 11. The type and number of the various sensors included in the in-vehicle sensor 26 are not particularly limited as long as they can be installed in the vehicle 1.
For example, the in-vehicle sensor 26 may include one or more of a camera, a radar, a seating sensor, a steering wheel sensor, a microphone, and a biometric sensor. As the camera provided in the in-vehicle sensor 26, for example, cameras of various photographing modes capable of ranging, such as a ToF camera, a stereo camera, a monocular camera, and an infrared camera, are used. The present invention is not limited thereto, and the camera provided in the in-vehicle sensor 26 may be used only to acquire a captured image regardless of distance measurement. The in-vehicle sensor 26 includes a biometric sensor provided in, for example, a seat, a steering wheel, or the like, and detects various biometric information of a passenger such as a driver.
The vehicle sensor 27 has various sensors for detecting the state of the vehicle 1, and can supply sensor data from the respective sensors to the respective sections of the vehicle control system 11. The types and the number of the various sensors provided in the vehicle sensor 27 are not particularly limited as long as they can be installed in the vehicle 1.
For example, the vehicle sensor 27 can have a speed sensor, an acceleration sensor, an angular velocity sensor (gyro sensor), and an inertial measurement device (IMU (Inertial Measurement Unit)) that integrates them. For example, the vehicle sensor 27 has a steering angle sensor that detects a steering angle of a steering wheel, a yaw rate sensor, an accelerator sensor that detects an operation amount of an accelerator pedal, and a brake sensor that detects an operation amount of a brake pedal. For example, the vehicle sensor 27 includes a rotation sensor that detects the rotational speeds of the engine and the motor, an air pressure sensor that detects the air pressure of the tire, a slip ratio sensor that detects the slip ratio of the tire, and a wheel speed sensor that detects the rotational speed of the wheel. For example, the vehicle sensor 27 has a battery sensor that detects the remaining amount and temperature of the battery and an impact sensor that detects an impact from the outside.
The storage unit 28 includes at least one of a nonvolatile storage medium and a volatile storage medium, and is capable of storing data and programs. The storage unit 28 is used as, for example, an EEPROM (Electrically Erasable Programmable Read Only Memory: electrically erasable programmable read only memory) and a RAM (Random Access Memory: random access memory), and as a storage medium, a magnetic storage device such as an HDD (Hard disk Drive), a semiconductor storage device, an optical storage device, and an magneto-optical storage device can be applied. The storage unit 28 stores various programs and data used by each unit of the vehicle control system 11. For example, the storage unit 28 includes EDR (Event Data Recorder: event data recorder) and DSSAD (Data Storage System for Automated Driving: automated driving data recording system), and stores information of the vehicle 1 before and after an event such as an accident and information acquired by the in-vehicle sensor 26.
The travel support/automatic driving control unit 29 can perform travel support and automatic driving control of the vehicle 1. For example, the travel support/automatic driving control unit 29 includes an analysis unit 61, an action planning unit 62, and an action control unit 63.
The analysis unit 61 can perform analysis processing of the conditions of the vehicle 1 and the surrounding area. The analysis unit 61 includes a self-position estimation unit 71, a sensor fusion unit 72, and a recognition unit 73.
The self-position estimating unit 71 can estimate the self-position of the vehicle 1 based on the sensor data from the external recognition sensor 25 and the high-precision map stored in the map information storage unit 23. For example, the self-position estimating unit 71 generates a local map based on sensor data from the external recognition sensor 25, and estimates the self-position of the vehicle 1 by matching the local map with a high-precision map. The position of the vehicle 1 can be, for example, based on the center of the rear wheel set axle.
Examples of the local map include a three-dimensional high-precision map produced by using a technique such as SLAM (Simultaneous Localization and Mapping: simultaneous localization and mapping), and an occupied grid map (Occupancy Grid Map). Examples of the three-dimensional map include the above-described point cloud map. The occupied-grid map is a map in which a three-dimensional or two-dimensional space around the vehicle 1 is divided into grids (grids) of a predetermined size, and the occupied state of an object is shown in grid units. The occupancy state of an object is represented, for example, by the presence or absence of the object and the presence probability. The partial map is also used for detection processing and identification processing of the external situation of the vehicle 1 by the identification unit 73, for example.
The self-position estimating unit 71 may estimate the self-position of the vehicle 1 based on the position information acquired by the position information acquiring unit 24 and the sensor data from the vehicle sensor 27.
The sensor fusion unit 72 can perform a sensor fusion process of combining a plurality of different kinds of sensor data (for example, image data supplied from the camera 51 and sensor data supplied from the radar 52) to obtain new information. As a method for combining different kinds of sensor data, aggregation, fusion, union, and the like can be cited.
The identification unit 73 can perform detection processing for detecting the external situation of the vehicle 1 and identification processing for identifying the external situation of the vehicle 1.
For example, the identification unit 73 performs detection processing and identification processing of the external situation of the vehicle 1 based on information from the external identification sensor 25, information from the own position estimation unit 71, information from the sensor fusion unit 72, and the like.
Specifically, the recognition unit 73 performs, for example, detection processing, recognition processing, and the like of objects around the vehicle 1. The object detection process is, for example, a process of detecting the presence or absence of an object, the size, shape, position, operation, or the like. The object recognition processing is, for example, processing for recognizing an attribute such as the kind of an object or recognizing a specific object. However, the detection process and the identification process are not necessarily clearly distinguished, and may be repeated.
For example, the recognition unit 73 performs clustering in which point clouds based on sensor data such as radar 52 and LiDAR53 are classified by blocks of the point group, and detects objects around the vehicle 1. Thereby, the presence, size, shape, and position of the object around the vehicle 1 are detected.
For example, the recognition unit 73 detects movement of objects around the vehicle 1 by performing tracking of movement of blocks following the point groups classified by clustering. Thereby, the speed and the traveling direction (movement vector) of the surrounding objects of the vehicle 1 are detected.
For example, the recognition unit 73 detects or recognizes a vehicle, a person, a bicycle, an obstacle, a structure, a road, a signal lamp, a traffic sign, a road sign, or the like based on the image data supplied from the camera 51. The identification unit 73 may identify the type of the object around the vehicle 1 by performing identification processing such as semantic division.
For example, the recognition unit 73 can perform the recognition processing of the traffic rules around the vehicle 1 based on the map stored in the map information storage unit 23, the estimation result based on the own position of the own position estimation unit 71, and the recognition result based on the objects around the vehicle 1 by the recognition unit 73. By this processing, the identification unit 73 can identify the position and state of the traffic light, the content of the traffic sign and the road sign, the content of the traffic restriction, the lanes on which the vehicle can travel, and the like.
For example, the recognition unit 73 can perform a process of recognizing the surrounding environment of the vehicle 1. As the surrounding environment of the recognition object, the recognition unit 73 assumes weather, air temperature, humidity, brightness, the state of the road surface, and the like.
The action planning unit 62 creates an action plan of the vehicle 1. For example, the action planning unit 62 can create an action plan by performing a process of path planning and path following.
The route plan (Global path planning) is a process of planning a rough route from a start point to an end point. The route plan also includes a process of performing track generation (Local path planning), which is called track planning, and can safely and smoothly travel in the vicinity of the vehicle 1 in consideration of the movement characteristics of the vehicle 1 on the planned route.
Path following refers to a process of planning an action for safely and correctly traveling on a path planned by a path plan within a planned time. The action planning unit 62 can calculate the target speed and the target angular speed of the vehicle 1 based on the result of the processing of the path following, for example.
The operation control unit 63 can control the operation of the vehicle 1 in order to realize the action plan created by the action planning unit 62.
For example, the operation control unit 63 controls a steering control unit 81, a braking control unit 82, and a driving control unit 83 included in the vehicle control unit 32 described later, and performs acceleration/deceleration control and directional control so that the vehicle 1 travels on a track calculated by a track plan. For example, the operation control unit 63 performs coordinated control for achieving the function of ADAS such as collision avoidance, impact alleviation, following travel, vehicle speed maintenance travel, collision warning of the vehicle, and lane departure warning of the vehicle. For example, the operation control unit 63 performs coordinated control for the purpose of autonomous driving or the like in which traveling is performed autonomously regardless of the operation of the driver.
The DMS30 can perform a driver authentication process, a driver state recognition process, and the like based on sensor data from the in-vehicle sensor 26, input data to the HMI31 described later, and the like. As the state of the driver to be identified, for example, a physical condition, a degree of wakefulness, a degree of concentration, a degree of fatigue, a direction of line of sight, a degree of intoxication, a driving operation, a posture, and the like are assumed.
The DMS30 may perform authentication processing of passengers other than the driver and recognition processing of the states of the passengers. For example, the DMS30 may perform the process of recognizing the situation in the vehicle based on the sensor data from the in-vehicle sensor 26. As the condition in the vehicle to be identified, for example, air temperature, humidity, brightness, smell, and the like are assumed.
The HMI31 can input various data, instructions, and the like, and present various data to the driver, and the like.
The input of data based on the HMI31 will be schematically described. The HMI31 has an input device for human input data. The HMI31 generates an input signal based on data, instructions, and the like input through an input device, and supplies the input signal to each part of the vehicle control system 11. The HMI31 has, for example, an operation element such as a touch panel, a button, a switch, and a lever as an input device. The HMI31 is not limited to this, and may be provided with an input device capable of inputting information by a method other than manual operation, such as a voice or a gesture. The HMI31 may use, for example, a remote control device using infrared rays or radio waves, a mobile device corresponding to the operation of the vehicle control system 11, or an external connection device such as a wearable device as an input device.
The presentation of data based on the HMI31 will be schematically described. The HMI31 generates visual information, acoustic information, and tactile information for the passenger or the outside of the vehicle. The HMI31 performs output control such as output of each piece of information generated by control, output content, output timing, and an output method. The HMI31 generates and outputs, as visual information, information shown by an image or light, such as an operation screen, a status display of the vehicle 1, a warning display, a monitor image showing a situation around the vehicle 1, and the like. The HMI31 generates and outputs, for example, information shown by sound such as a sound guide, a warning sound, and a warning message as audible information. The HMI31 generates and outputs, for example, information of a sense of touch given to the occupant by force, vibration, movement, or the like, as the sense of touch information.
As an output device for outputting visual information by the HMI31, for example, a display device for presenting visual information by displaying an image by itself, or a projector device for presenting visual information by projecting an image can be applied. The display device may be, for example, a head-up display, a transmissive display, a wearable device having an AR (Augmented Reality: augmented reality) function, or the like, in addition to a display device having a normal display, and may be a device for displaying visual information in the field of view of the passenger. The HMI31 may use a display device provided in the navigation device, the instrument panel, CMS (Camera Monitoring System), the electronic mirror, the lamp, or the like of the vehicle 1 as an output device for outputting visual information.
As an output device for outputting the acoustic information, for example, an audio speaker, a headphone, and an earphone can be applied.
As an output device for outputting haptic information, for example, a haptic element using haptic technology can be applied. The haptic element is provided at a portion where a passenger of the vehicle 1 contacts, for example, a steering wheel, a seat, or the like.
The vehicle control unit 32 can control each unit of the vehicle 1. The vehicle control unit 32 includes a steering control unit 81, a brake control unit 82, a drive control unit 83, a vehicle body system control unit 84, a lamp control unit 85, and a horn control unit 86.
The steering control unit 81 can detect and control the state of the steering system of the vehicle 1. The steering system includes, for example, a steering mechanism including a steering wheel and the like, and electric power steering and the like. The steering control unit 81 includes, for example, a steering ECU that controls the steering system, an actuator that drives the steering system, and the like.
The brake control unit 82 can detect and control the state of the brake system of the vehicle 1. The brake system includes, for example, a brake mechanism including a brake pedal or the like, an ABS (Antilock Brake System: antilock brake system), a regenerative brake mechanism, and the like. The brake control unit 82 includes, for example, a brake ECU that controls a brake system, an actuator that drives the brake system, and the like.
The drive control unit 83 can detect and control the state of the drive system of the vehicle 1. The drive system includes, for example, an accelerator pedal, a drive force generating device for generating a drive force of an internal combustion engine, a drive motor, or the like, a drive force transmitting mechanism for transmitting the drive force to wheels, and the like. The drive control unit 83 includes, for example, a drive ECU that controls the drive system, an actuator that drives the drive system, and the like.
The vehicle body system control unit 84 can detect and control the state of the vehicle body system of the vehicle 1. The vehicle body system includes, for example, a keyless entry system, a smart key system, a power window device, a power seat, an air conditioner, an airbag, a seat belt, a shift lever, and the like. The vehicle body system control unit 84 includes, for example, a vehicle body system ECU that controls a vehicle body system, an actuator that drives the vehicle body system, and the like.
The lamp control unit 85 can detect and control the states of various lamps of the vehicle 1. As the lamp to be controlled, for example, a headlight, a rear lamp, a fog lamp, a turn signal lamp, a brake lamp, a projection lamp, a display of a bumper, and the like are assumed. The lamp control unit 85 includes a lamp ECU that controls the lamp, an actuator that drives the lamp, and the like.
The horn control section 86 can detect and control the state of the automobile horn of the vehicle 1. The horn control unit 86 includes, for example, a horn ECU that controls the horn of the automobile, an actuator that drives the horn of the automobile, and the like.
Fig. 19 is a diagram showing an example of the sensing areas of the camera 51, radar 52, liDAR53, ultrasonic sensor 54, and the like of the external recognition sensor 25 of fig. 18. In fig. 19, the vehicle 1 is schematically shown as seen from above, with the left end side being the front end (front) side of the vehicle 1 and the right end side being the rear end (rear) side of the vehicle 1.
The sensing region 101F and the sensing region 101B show examples of the sensing region of the ultrasonic sensor 54. The sensing region 101F covers the front end periphery of the vehicle 1 by the plurality of ultrasonic sensors 54. The sensing region 101B covers the rear end periphery of the vehicle 1 by the plurality of ultrasonic sensors 54.
The sensing results in the sensing region 101F and the sensing region 101B are used for, for example, parking assistance of the vehicle 1, or the like.
Sensing region 102F to sensing region 102B illustrate examples of sensing regions of radar 52 for short or medium range. The sensing region 102F is in front of the vehicle 1, covering a position farther than the sensing region 101F. The sensing region 102B is at the rear of the vehicle 1, covering a position farther than the sensing region 101B. The sensing region 102L covers the periphery rearward of the left side face of the vehicle 1. The sensing region 102R covers the periphery rearward of the right side face of the vehicle 1.
The sensing result in the sensing region 102F is used for detection of a vehicle, a pedestrian, or the like existing in front of the vehicle 1, for example. The sensing result in the sensing region 102B is used for, for example, a collision prevention function or the like in the rear of the vehicle 1. The sensing results in the sensing region 102L and the sensing region 102R are used, for example, for detection of an object in a dead angle on the side of the vehicle 1.
The sensing region 103F to the sensing region 103B show an example of the sensing region of the camera 51. The sensing region 103F is in front of the vehicle 1, covering to a position farther than the sensing region 102F. The sensing region 103B is at the rear of the vehicle 1, covering to a position farther than the sensing region 102B. The sensing region 103L covers the periphery of the left side face of the vehicle 1. The sensing region 103R covers the periphery of the right side face of the vehicle 1.
The sensing result in the sensing region 103F can be used for, for example, a signal lamp, recognition of traffic signs, a lane departure prevention support system, and an automatic headlamp control system. The sensing result in the sensing region 103B can be used for parking assistance and panoramic view systems, for example. The sensing results in the sensing region 103L and the sensing region 103R can be used for a panoramic view system, for example.
The sensing region 104 shows an example of a sensing region of LiDAR 53. The sensing region 104 is in front of the vehicle 1, covering to a position farther than the sensing region 103F. On the other hand, the sensing region 104 is narrowed in the left-right direction as compared with the sensing region 103F.
The sensing result in the sensing region 104 is used for object detection of a surrounding vehicle or the like, for example.
The sensing region 105 shows an example of a sensing region of the radar 52 for long distances. The sensing area 105 is in front of the vehicle 1, covering a position farther than the sensing area 104. On the other hand, the sensing region 105 is narrowed in the left-right direction as compared with the sensing region 104.
The sensing result in the sensing region 105 is used for, for example, ACC (Adaptive Cruise Control: adaptive cruise control), emergency braking, collision avoidance, and the like.
The sensing areas of the respective sensors included in the external recognition sensor 25, that is, the camera 51, the radar 52, the LiDAR53, and the ultrasonic sensor 54, may have various configurations other than those shown in fig. 19. Specifically, the ultrasonic sensor 54 may sense the side of the vehicle 1, or the LiDAR53 may sense the rear of the vehicle 1. The installation position of each sensor is not limited to the above-described examples. The number of each sensor may be one or a plurality of sensors.
The technology of the present disclosure can be applied to, for example, the camera 51, the LiDAR53, and the like. For example, by applying the technology of the present disclosure to the sensor fusion portion 72 that processes data from the cameras 51 and LiDAR53 of the vehicle control system 11, the internal parameters or the external parameters of the cameras 51 and LiDAR53 can be calibrated.
Supplement 6
While the preferred embodiments of the present disclosure have been described in detail with reference to the drawings, the technical scope of the present disclosure is not limited to such examples. It is obvious that various modifications and modifications can be made by those having ordinary knowledge in the art of the present disclosure within the scope of the technical ideas described in the claims, and it is to be understood that these modifications and modifications are of course also within the technical scope of the present disclosure.
The embodiments of the present disclosure described above can include, for example, a program (model) for causing a computer to function as the information processing apparatus of the present embodiment, and a non-transitory tangible medium on which the program (model) is recorded. In addition, in the embodiment of the present disclosure, the program (model) may be distributed via a communication line (including wireless communication) such as the internet.
The steps in the processing according to the embodiment of the present disclosure described above may not necessarily be performed in the order described. For example, the steps may be processed by changing the order appropriately. Instead of performing the processing in time series, the steps may be partially processed in parallel or independently. In the embodiment of the present disclosure, the processing method of each step may not necessarily be processed along the described method, and may be processed by another method by another functional unit, for example.
The effects described in the present specification are merely effects described or exemplified, and are not limited thereto. In other words, the technology of the present disclosure can achieve other effects that are clear to those skilled in the art from the description of the present specification, together with or instead of the above-described effects.
In the embodiment of the present disclosure, for example, the configuration described as one device may be divided into a plurality of devices. Conversely, the configuration described above as a plurality of devices may be collectively configured as one device. It is needless to say that other structures than those described above may be added to the structure of each device. In addition, as long as the configuration and operation of the entire system are substantially the same, a part of the configuration of one device may be included in the configuration of another device. The above system refers to a collection of a plurality of components (devices, modules (components), etc.), whether or not all the components are located in the same housing. Therefore, a plurality of devices stored in separate housings and connected via a network and one device storing a plurality of modules in one housing can be grasped as a system.
The present technology can also adopt the following configuration.
(1) An information processing apparatus, wherein,
comprises a learner for acquiring common feature points and feature quantities in a plurality of images and generating a model for matching the common feature points,
one of the first image acquired from the first imaging unit and the second image acquired from the second imaging unit and the projection image acquired and projected from a different imaging unit than the one image are input to the learner as a pair of input data.
(2) The information processing apparatus according to the above (1), wherein,
the projection image is a first projection image obtained by projecting the first image or a second projection image obtained by projecting the second image.
(3) The information processing apparatus according to the above (2), wherein,
a plurality of the pair of input data are input to the learner.
(4) The information processing apparatus according to the above (3), wherein,
at least one of another pair of input data including the first image and the first projection image and another pair of input data including the second image and the second projection image is input to the learner.
(5) The information processing apparatus according to any one of the above (2) to (4), wherein,
the learner includes:
an encoder unit which performs dimension compression on the pair of input data;
a detector unit configured to detect the feature point from the pair of compressed input data; and
and a feature amount acquisition unit configured to acquire the feature amount from the compressed pair of input data.
(6) The information processing apparatus according to the above (5), wherein,
The learner compares the characteristic points outputted from the detector with the characteristic points of the teacher data,
the learner compares a plurality of the feature values from the pair of input data outputted from the feature value obtaining unit.
(7) The information processing apparatus according to the above (6), wherein,
the encoder section includes:
a first encoder unit for inputting the first image and the first projection image; and
and a second encoder unit for inputting the second image and the second projection image.
(8) The information processing apparatus according to the above (6) or (7), wherein,
further comprises a teacher data generation unit for generating the teacher data,
the teacher data generating unit acquires likelihood maps of the feature points from the first and second images and the first and second projection images, and combines the likelihood maps.
(9) The information processing apparatus according to the above (8), wherein,
the teacher data generation unit performs machine learning using CG images in advance.
(10) The information processing apparatus according to any one of the above (1) to (9), wherein,
The image processing unit generates an image to be input to the learner by cutting the first wide area image acquired from the first imaging unit and the second wide area image acquired from the second imaging unit into images from the same viewpoint.
(11) The information processing apparatus according to the above (10), wherein,
the mask unit generates a mask of noise in the wide area images based on the first wide area image and the second wide area image whose alignment is adjusted.
(12) The information processing apparatus according to any one of the above (1) to (11), wherein,
the image processing apparatus further includes a feature point extraction unit that obtains feature points and feature amounts in the plurality of images using the model, and performs matching of the feature points in common.
(13) The information processing apparatus according to the above (12), wherein,
the feature point extracting unit acquires feature points and feature amounts in the first image and the second image newly acquired from the different imaging units, and performs matching of the feature points in common.
(14) The information processing apparatus according to the above (12), wherein,
the feature point extraction unit acquires feature points and feature amounts in the newly acquired plurality of first images or the plurality of second images, and performs matching of the feature points in common.
(15) The information processing apparatus according to any one of the above (12) to (14), wherein,
the image processing apparatus further includes a calibration unit that performs calibration of parameters related to the first imaging unit and the second imaging unit based on a positional relationship between the first imaging unit that acquires the first image and the second imaging unit that acquires the second image,
the calibration unit performs calibration using the position information of the feature points that have been matched.
(16) The information processing apparatus according to any one of the above (1) to (15), wherein,
the first photographing part is formed by a LiDAR or ToF sensor,
the second imaging unit is constituted by an image sensor.
(17) An information processing system, wherein,
comprises a first shooting part, a second shooting part and an information processing device,
the information processing apparatus includes a learner that acquires a feature point and a feature quantity common to a plurality of images and generates a model for matching the common feature point,
One of the first image acquired from the first imaging unit and the second image acquired from the second imaging unit and the projection image acquired and projected from a different imaging unit than the one image are input to the learner as a pair of input data.
(18) A model for causing a computer to function to acquire a feature point and a feature quantity common to a plurality of images and to match the feature point in common, wherein,
the information processing apparatus obtains the model by performing machine learning using, as a pair of input data, one of the first image acquired from the first imaging unit and the second image acquired from the second imaging unit, and a projection image acquired and projected from a different imaging unit than the one image.
(19) A model generation method for causing a computer to function to acquire a feature point and a feature quantity common to a plurality of images and to generate a model for matching the common feature point,
the information processing apparatus generates the model by performing machine learning using, as a pair of input data, one of the first image acquired from the first imaging unit and the second image acquired from the second imaging unit, and a projection image acquired and projected from a different imaging unit than the one image.
Reference numerals illustrate: a vehicle 1, a 10 information processing system, a vehicle control system 11, a vehicle control 21, a communication unit 22, a map information storage unit 23, a position information acquisition unit 24, an external recognition sensor 25, an in-vehicle sensor 26, a vehicle sensor 27, a storage unit 28, a travel assist/autopilot control unit 29, a Driver Monitor System (DMS), a human-machine interface (HMI), a vehicle control unit 32, a communication network 41, a 51, a 200 camera, a 52 radar, a 53, a 100 LiDAR, a 54 ultrasonic sensor 61 analysis unit, a 62 action planning unit 63 action control unit, a 71 self-position estimation unit 72 sensor fusion unit, a 73 recognition unit 81 steering control unit 82 brake control unit 83 drive control unit 84 car body system control unit 85 lamp control unit 86 horn control unit 86, 300 information processing apparatus, 302, 322, 342, 362 reflected intensity image acquisition section, 304, 324, 344, 364 visible light image acquisition section, 306, 326 reflected intensity image processing section, 308, 328 visible light image processing section, 310 feature point acquisition section, 312 position information acquisition section, 314 calibration section, 330 mask generation section, 332 input data generation section, 346, 366 reflected intensity image projection section, 348, 368 visible light image projection section, 350 positive solution label generation section, 370 learning section, 372 encoder section, 374 detector section, 376 descriptor section, 400, 404, 406 reflected intensity image, 402 reflected intensity panoramic image, 410 projected reflected intensity image, 500, 504, 506 visible light image, 502 visible light panoramic image, 510 … projects a visible light image, 602, 604 … mask image, 700 … CG image, 704, 710a, 710b, 712a, 712b … input data, 800, 900, 904 … positive labels, 802 … likelihood map, 810a, 810b, 812 … result data.

Claims (19)

Translated fromChinese
1.一种信息处理装置,其中,1. An information processing device, wherein:具备获取多个图像中共同的特征点以及特征量并生成用于对共同的所述特征点进行匹配的模型的学习器,A learner is provided for acquiring common feature points and feature quantities in a plurality of images and generating a model for matching the common feature points.向所述学习器输入从第一拍摄部获取到的第一图像以及从第二拍摄部获取到的第二图像中的一方图像、和从与该一方图像不同的拍摄部获取并投影的投影图像,作为一对输入数据。The learner is inputted with one of a first image acquired from a first imaging unit and a second image acquired from a second imaging unit, and a projection image acquired and projected from an imaging unit different from the first image, as a pair of input data.2.根据权利要求1所述的信息处理装置,其中,2. The information processing device according to claim 1, wherein:所述投影图像是对所述第一图像进行投影而得到的第一投影图像或者对所述第二图像进行投影而得到的第二投影图像。The projection image is a first projection image obtained by projecting the first image or a second projection image obtained by projecting the second image.3.根据权利要求2所述的信息处理装置,其中,3. The information processing device according to claim 2, wherein:向所述学习器输入多个所述一对输入数据。A plurality of pairs of input data are input into the learner.4.根据权利要求3所述的信息处理装置,其中,4. The information processing device according to claim 3, wherein:还向所述学习器输入由所述第一图像和所述第一投影图像构成的其它的一对输入数据、以及由所述第二图像和所述第二投影图像构成的其它的一对输入数据中的至少一对。At least one of another pair of input data consisting of the first image and the first projection image and another pair of input data consisting of the second image and the second projection image is input to the learner.5.根据权利要求2所述的信息处理装置,其中,5. The information processing device according to claim 2, wherein:所述学习器包括:The learner comprises:编码器部,对所述一对输入数据进行维度压缩;An encoder unit, which performs dimension compression on the pair of input data;探测器部,从被压缩的所述一对输入数据检测所述特征点;以及a detector section that detects the feature point from the compressed pair of input data; and特征量获取部,从所述被压缩的所述一对输入数据获取所述特征量。The feature quantity acquisition unit acquires the feature quantity from the pair of compressed input data.6.根据权利要求5所述的信息处理装置,其中,6. The information processing device according to claim 5, wherein:所述学习器将从所述探测器部输出的所述特征点与教师数据的所述特征点进行比较,The learner compares the feature points output from the detector unit with the feature points of the teacher data,所述学习器对从所述特征量获取部输出的来自所述一对输入数据的多个所述特征量进行比较。The learner compares the plurality of feature quantities output from the feature quantity acquisition section from the pair of input data.7.根据权利要求6所述的信息处理装置,其中,7. The information processing device according to claim 6, wherein:所述编码器部包括:The encoder unit comprises:第一编码器部,输入所述第一图像以及所述第一投影图像;以及a first encoder unit which inputs the first image and the first projection image; and第二编码器部,输入所述第二图像以及所述第二投影图像。The second encoder unit receives the second image and the second projection image as input.8.根据权利要求6所述的信息处理装置,其中,8. The information processing device according to claim 6, wherein:还具备生成所述教师数据的教师数据生成部,It also includes a teacher data generating unit that generates the teacher data,所述教师数据生成部根据所述第一图像以及第二图像和所述第一投影图像以及第二投影图像获取所述特征点的似然图,并将所述似然图合并。The teacher data generating unit obtains likelihood maps of the feature points based on the first image and the second image and the first projection image and the second projection image, and merges the likelihood maps.9.根据权利要求8所述的信息处理装置,其中,9. The information processing device according to claim 8, wherein:所述教师数据生成部预先进行使用了CG图像的机器学习。The teacher data generating unit performs machine learning using CG images in advance.10.根据权利要求1所述的信息处理装置,其中,10. The information processing device according to claim 1, wherein:还具备图像处理部,所述图像处理部通过将从所述第一拍摄部获取到的第一广域图像和从所述第二拍摄部获取到的第二广域图像剪切为成为来自同一视点的图像,来生成用于输入到所述学习器的图像。An image processing unit is further provided for generating an image to be input to the learner by cutting out a first wide area image acquired from the first imaging unit and a second wide area image acquired from the second imaging unit so as to have images from the same viewpoint.11.根据权利要求10所述的信息处理装置,其中,11. The information processing device according to claim 10, wherein:还具备由卷积神经网络构成的掩模部,所述掩模部基于调整了对准的所述第一广域图像和所述第二广域图像,生成这些广域图像内的噪声的掩模。The present invention further includes a mask unit formed by a convolutional neural network, and the mask unit generates a mask of noise in these wide area images based on the first wide area image and the second wide area image whose alignment is adjusted.12.根据权利要求1所述的信息处理装置,其中,12. The information processing device according to claim 1, wherein:还具备特征点提取部,所述特征点提取部使用所述模型,获取所述多个图像中的特征点以及特征量,并进行共同的所述特征点的匹配。The image processing apparatus further includes a feature point extraction unit that uses the model to obtain feature points and feature amounts in the plurality of images and performs matching of the common feature points.13.根据权利要求12所述的信息处理装置,其中,13. The information processing device according to claim 12, wherein:所述特征点提取部获取从不同的所述拍摄部新获取的所述第一图像以及第二图像中的特征点以及特征量,并进行共同的所述特征点的匹配。The feature point extraction unit acquires feature points and feature amounts in the first image and the second image newly acquired from the different imaging units, and performs matching of the common feature points.14.根据权利要求12所述的信息处理装置,其中,14. The information processing device according to claim 12, wherein:所述特征点提取部获取新获取的多个所述第一图像或者多个所述第二图像中的特征点以及特征量,并进行共同的所述特征点的匹配。The feature point extraction unit acquires feature points and feature amounts in a plurality of newly acquired first images or a plurality of second images, and performs matching of the common feature points.15.根据权利要求12所述的信息处理装置,其中,15. The information processing device according to claim 12, wherein:还具备校准部,所述校准部基于获取所述第一图像的第一拍摄部与获取所述第二图像的第二拍摄部之间的位置关系,进行与所述第一拍摄部以及第二拍摄部相关的参数的校准,The apparatus further comprises a calibration unit configured to calibrate parameters related to the first imaging unit and the second imaging unit based on a positional relationship between a first imaging unit that acquires the first image and a second imaging unit that acquires the second image.所述校准部使用进行了匹配的所述特征点的位置信息,进行校准。The calibration unit performs calibration using the matched position information of the feature points.16.根据权利要求1所述的信息处理装置,其中,16. The information processing device according to claim 1, wherein:所述第一拍摄部由LiDAR或者ToF传感器构成,The first imaging unit is composed of a LiDAR or a ToF sensor.所述第二拍摄部由图像传感器构成。The second imaging unit is composed of an image sensor.17.一种信息处理系统,其中,17. An information processing system, wherein:包括第一拍摄部、第二拍摄部以及信息处理装置,The device comprises a first photographing unit, a second photographing unit and an information processing device,所述信息处理装置具备获取多个图像中共同的特征点以及特征量并生成用于对共同的所述特征点进行匹配的模型的学习器,The information processing device includes a learner that acquires common feature points and feature quantities in a plurality of images and generates a model for matching the common feature points.向所述学习器输入从所述第一拍摄部获取到的第一图像以及从所述第二拍摄部获取到的第二图像中的一方图像、和从与该一方图像不同的拍摄部获取并投影的投影图像,作为一对输入数据。The learner is inputted with one of a first image acquired from the first imaging unit and a second image acquired from the second imaging unit, and a projection image acquired and projected from an imaging unit different from the one image, as a pair of input data.18.一种模型,是使计算机发挥作用以获取多个图像中共同的特征点以及特征量并对共同的所述特征点进行匹配的模型,其中,18. A model that enables a computer to function to obtain common feature points and feature quantities in multiple images and to match the common feature points, wherein:信息处理装置通过将从第一拍摄部获取到的第一图像以及从第二拍摄部获取到的第二图像中的一方图像、和从与该一方图像不同的拍摄部获取并投影的投影图像作为一对输入数据并进行机器学习,来得到所述模型。The information processing device obtains the model by using one of the first image acquired from the first imaging unit and the second image acquired from the second imaging unit, and a projection image acquired and projected from an imaging unit different from the one image as a pair of input data and performing machine learning.19.一种模型的生成方法,是用于使计算机发挥作用以获取多个图像中共同的特征点以及特征量并生成用于对共同的所述特征点进行匹配的模型的模型的生成方法,其中,19. A method for generating a model, which is used to cause a computer to function to obtain common feature points and feature quantities in a plurality of images and to generate a model for matching the common feature points, wherein:信息处理装置通过将从第一拍摄部获取到的第一图像以及从第二拍摄部获取到的第二图像中的一方图像、和从与该一方图像不同的拍摄部获取并投影的投影图像作为一对输入数据并进行机器学习,来生成所述模型。The information processing device generates the model by using one of the first image acquired from the first imaging unit and the second image acquired from the second imaging unit, and a projection image acquired and projected from an imaging unit different from the one image as a pair of input data and performing machine learning.
CN202280055900.9A2021-08-202022-03-09Information processing device, information processing system, model, and model generation methodPendingCN117836818A (en)

Applications Claiming Priority (3)

Application NumberPriority DateFiling DateTitle
JP2021-1349742021-08-20
JP20211349742021-08-20
PCT/JP2022/010155WO2023021755A1 (en)2021-08-202022-03-09Information processing device, information processing system, model, and model generation method

Publications (1)

Publication NumberPublication Date
CN117836818Atrue CN117836818A (en)2024-04-05

Family

ID=85240373

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202280055900.9APendingCN117836818A (en)2021-08-202022-03-09Information processing device, information processing system, model, and model generation method

Country Status (3)

CountryLink
JP (1)JPWO2023021755A1 (en)
CN (1)CN117836818A (en)
WO (1)WO2023021755A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN118365685B (en)*2024-06-202024-08-16阿米华晟数据科技(江苏)有限公司Training method and device for registration fusion of visible light and infrared image

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP2004317507A (en)*2003-04-042004-11-11Omron CorpAxis-adjusting method of supervisory device
IL304881B2 (en)*2017-11-142024-07-01Magic Leap Inc Discovering and describing a fully complex point of interest using homographic processing

Also Published As

Publication numberPublication date
JPWO2023021755A1 (en)2023-02-23
WO2023021755A1 (en)2023-02-23

Similar Documents

PublicationPublication DateTitle
US10671860B2 (en)Providing information-rich map semantics to navigation metric map
CN111886626B (en)Signal processing device, signal processing method, program, and moving object
CN113168691B (en)Information processing device, information processing method, program, mobile body control device, and mobile body
US20210122364A1 (en)Vehicle collision avoidance apparatus and method
CN111918053A (en)Vehicle image verification
CN111833650A (en) Vehicle Path Prediction
CN113168692B (en) Information processing device, information processing method, program, mobile object control device, and mobile object
CN108572663A (en) Target Tracking
JP2019045892A (en)Information processing apparatus, information processing method, program and movable body
WO2019073920A1 (en)Information processing device, moving device and method, and program
CN111986128A (en)Off-center image fusion
JP7626632B2 (en) Information processing device, information processing method, and program
WO2019188391A1 (en)Control device, control method, and program
WO2021241189A1 (en)Information processing device, information processing method, and program
CN110691986A (en)Apparatus, method and computer program for computer vision
US20240290108A1 (en)Information processing apparatus, information processing method, learning apparatus, learning method, and computer program
WO2019150918A1 (en)Information processing device, information processing method, program, and moving body
JP2025061744A (en) Information processing device, information processing system, and program
CN116311216A (en) 3D object detection
US20230206596A1 (en)Information processing device, information processing method, and program
CN117836818A (en)Information processing device, information processing system, model, and model generation method
US20250002007A1 (en)Information processing device, information processing method, and program
US20250172950A1 (en)Information processing apparatus, information processing method, information processing program, and mobile apparatus
US20240290204A1 (en)Information processing device, information processing method, and program
CN116710971A (en)Object recognition method and time-of-flight object recognition circuit

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp