Detailed Description
Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the present specification and drawings, the same reference numerals are given to components having the same functional structures, and overlapping descriptions are omitted. In the present specification and drawings, different letters may be appended to the same reference numerals to distinguish structural elements that actually have the same or similar functional structures. However, in the case where it is not necessary to particularly distinguish each of a plurality of structural elements actually having the same or similar functional structure, only the same reference numeral is attached.
In the following description, the feature points refer to positions and coordinate information of the feature points such as the center point, the branch points, the intersections, and the end points on the outline of the subject in the image, which show the shape of the subject. The feature quantity is information such as the shape, direction, and expansion of the feature point, which can be represented by digitizing the features of the feature point.
The following procedure is described.
1. Background of authoring embodiments of the present disclosure
1.1 overview of an information handling System
1.2 detailed construction of information processing apparatus
1.3 information processing method
1.4 background
2. Description of the embodiments
2.1 information processing method
2.2 Generation of input data
2.3 Generation of orthographic labels
2.4 learning
3. Summary
4. Hardware structure
5. Application example
6. Supplement and supplement
1. Background of the embodiments of the present disclosure
First, before explaining embodiments of the present disclosure, a background in which the present inventors have authored the embodiments of the present disclosure is explained.
As described above, in the near future, a mobile body (for example, an automated car) using an automated driving technique and a highly intelligent robot are considered to be used daily, and it is assumed that a plurality of sensors for capturing the surrounding environment are mounted on the mobile body and the robot. In view of such a background, as one of sensor fusion technologies using sensing data obtained from a plurality of sensors, a technology for accurately and easily aligning these plurality of different kinds of sensors is further demanded. First, an outline of an information processing system using such a technique will be described.
< 1.1 outline of information processing System >
First, an outline of an information processing system 10 according to an embodiment of the present disclosure will be described with reference to fig. 1. Fig. 1 is an explanatory diagram illustrating a configuration example of an information processing system 10 according to the present embodiment.
As shown in fig. 1, the information processing system 10 of the present embodiment includes a LiDAR (Light Detection and Ranging: liDAR) (first imaging section) 100, a camera (second imaging section), and an information processing apparatus 300, which are communicably connected to each other via a network. Specifically, although any communication method can be applied regardless of the wired or wireless (for example, wiFi (registered trademark), bluetooth (registered trademark)) communication method that can maintain stable operation is preferably used. The LiDAR100, the camera, and the information processing apparatus 300 may be connected to a network via a base station or the like (not shown) (for example, a base station of a mobile phone, an access point of a wireless LAN (Local Area Network: local area network), or the like). The outline of each device included in the information processing system 10 according to the present embodiment will be described in order.
(LiDAR100)
The LiDAR100 can measure the distance (relative coordinates) to the subject or determine the shape of the subject by irradiating the laser light to the subject while scanning the laser light and observing the scattered and reflected light. In this specification, an image based on reflected light acquired by the LiDAR100 is referred to as a reflected intensity image (first image). In addition, in the embodiment of the present disclosure, a ToF (Time of Flight) sensor (not shown) may be used instead of the LiDAR 100. The ToF sensor can measure the distance to the subject or determine the shape of the subject by irradiating pulsed light to the subject and observing the return time of the light reflected by the subject.
(Camera 200)
The camera 200 is an image sensor capable of detecting radiation light from a subject and outputting an image signal in order to acquire an image of the subject based on the radiation light from the subject. Specifically, the camera 200 is an RGB image sensor, specifically, an image sensor capable of color imaging, in which a plurality of pixels capable of detecting blue light, green light, and red light, which are visible light, are arranged in a Bayer array. In this specification, an image based on visible light acquired by the camera 200 is referred to as a visible light image (second image). In the present embodiment, a monochrome (black-and-white) image sensor may be used instead of the RGB image sensor.
(information processing apparatus 300)
The information processing apparatus 300 is constituted by a computer or the like, for example. The information processing apparatus 300 processes, for example, images acquired by the LiDAR100 and the camera 200, and outputs images and the like obtained by the processing to other devices. The information processing apparatus 300 can perform alignment (calibration) of the LiDAR100 and the camera 200. Further, details of the information processing apparatus 300 will be described later.
In the present embodiment, the information processing apparatus 300 may be constituted by a plurality of apparatuses, and is not particularly limited.
In fig. 1, the information processing system 10 of the present embodiment is shown to include one LiDAR100 and a camera 200, but the present embodiment is not limited thereto. For example, the information processing system 10 of the present embodiment may include a plurality of LiDAR100 and cameras 200. The information processing system 10 of the present embodiment may include, for example, another image sensor that observes light of a specific wavelength and generates an image, and is not particularly limited.
Detailed structure of 1.2 information processing apparatus
Next, a detailed configuration of the information processing apparatus 300 according to the present embodiment will be described with reference to fig. 2. Fig. 2 is a block diagram showing an example of the configuration of an information processing apparatus 300 according to the present embodiment. Here, the description will be focused on the function of the information processing apparatus 300 for performing alignment of the LiDAR100 and the camera 200. As described above, the information processing apparatus 300 is constituted by a computer or the like, for example. In detail, as shown in fig. 2, the information processing apparatus 300 mainly includes a reflected intensity image acquisition unit 302, a visible light image acquisition unit 304, a reflected intensity image processing unit 306, a visible light image processing unit 308, a feature point acquisition unit 310, a position information acquisition unit 312, and a calibration unit 314. The following describes each functional unit of the information processing apparatus 300 in detail.
(reflection intensity image acquiring section 302)
The reflected intensity image acquisition unit 302 acquires data of a reflected intensity image from the LiDAR100, and outputs the acquired data to a reflected intensity image processing unit 306, which will be described later.
(visible light image acquisition section 304)
The visible light image acquisition unit 304 acquires data of a visible light image from the camera 200, and outputs the data to a visible light image processing unit 308 described later.
(reflection intensity image processing section 306)
The reflection intensity image processing unit 306 cuts the image into a predetermined position (viewpoint), a predetermined size, and a predetermined view angle (FOV) based on the reflection intensity image data from the reflection intensity image acquisition unit 302, and generates a reflection intensity image as input data. The reflection intensity image processing unit 306 outputs the generated reflection intensity image to a feature point acquisition unit 310 described later. At this time, the reflection intensity image processing unit 306 may perform optical distortion, brightness adjustment (gain adjustment), and contrast adjustment (gamma adjustment) in the image.
(visible light image processing section 308)
The visible light image processing unit 308 cuts an image into a predetermined position (viewpoint), a predetermined size, and a predetermined view angle (FOV) based on the visible light image data from the visible light image acquisition unit 304, and generates a visible light image as input data. The visible light image processing unit 308 outputs the generated visible light image to a feature point acquisition unit 310 described later. At this time, the visible light image processing unit 308 may perform optical distortion, brightness adjustment (gain adjustment), and contrast adjustment (gamma adjustment) in the image.
(feature Point acquiring section 310)
The feature point acquisition unit 310 can acquire feature points and feature amounts in a plurality of images using a model of the present embodiment described later, and can perform matching of a front feature point common to the plurality of images based on the acquired feature points and feature amounts. For example, in the present embodiment described below, the feature point acquisition unit 310 may perform feature point matching between the reflection intensity image and the visible light image, and may perform feature point matching between a plurality of reflection intensity images or between a plurality of visible light images. However, since the use of the LiDAR100 and the camera 200 in the alignment will be described here, only the matching of the common feature points between the visible light image and the reflection intensity image will be described. The feature point acquisition unit 310 outputs information of the feature point (coordinate information in the image, etc.) to be matched to the calibration unit 314 described later. For example, in the matching of the present embodiment, as the feature amount of each feature point, a norm is calculated, and the feature point having the smallest distance between the plurality of images is matched. The model generation of the present embodiment will be described in detail later.
(position information acquiring section 312)
The position information acquisition unit 312 can acquire the distance to the subject and the relative position coordinates of the subject based on the time at which the irradiated light detected by the LiDAR100 is reflected by the subject and returned, and can output the acquired distance and the like to the calibration unit 314 described later. In the present embodiment, the calculation of the distance and the like may be performed by the LiDAR 100.
(calibration part 314)
The calibration unit 314 can calibrate (correct) the spatial difference (positional relationship) and the optical difference between the LiDAR100 and the camera 200. For example, the calibration unit 314 corrects external parameters (positional parameters) and/or internal parameters (optical parameters) of the LiDAR100 and the camera 200 based on differences in the positions where the LiDAR100 and the camera 200 are disposed (parallax, distance to the subject), differences in the angle of view of the LiDAR100 and the camera 200, and lens aberrations, so as to cancel differences (deviations) in positional information in images output from the LiDAR100 and the camera 200. At this time, the calibration unit 314 can correct the feature points that have been matched by the feature point acquisition unit 310 by using the position information (coordinate information on the world coordinate system or the relative coordinate system) based on the position information acquisition unit 312.
In the present embodiment, the configuration of the information processing apparatus 300 is not limited to that shown in fig. 2, and may further include, for example, a functional block not shown.
< 1.3 information processing method >)
Next, an information processing method according to an embodiment of the present disclosure will be described with reference to fig. 3 and 4. Here, the process of performing alignment of the LiDAR100 and the camera 200 performed by the information processing apparatus 300 will be described. Fig. 3 and 4 are flowcharts illustrating an example of the information processing method according to the present embodiment.
In detail, as shown in fig. 3, the information processing method of the present embodiment can mainly include a plurality of steps from step S100 to step S400. The following describes each of these steps in this embodiment in detail.
First, the information processing apparatus 300 collects one or more visible light images from the camera 200 (step S100). Next, the information processing apparatus 300 collects one or more reflected intensity images from the LiDAR (step S200).
Then, the information processing apparatus 300 acquires the feature points and the feature amounts in the visible light image and the reflected intensity image collected in the above-described step S100 and step S200, and performs matching of the feature points common to the visible light image and the reflected intensity image based on the acquired feature points and feature amounts (step S300). The information processing apparatus 300 corrects (corrects) the spatial difference (positional relationship) and the optical difference between the LiDAR100 and the camera 200 (step S400). At this time, the information processing apparatus 300 can correct the feature points using the position information (coordinate information on the world coordinate system or the relative coordinate system) of the feature points that have been matched.
In detail, step S300 of fig. 3 may mainly include a plurality of steps from step S301 to step S303 shown in fig. 4. The following describes each of these steps in detail.
First, the information processing apparatus 300 acquires feature points and feature amounts from the visible light image collected in step S100 using a model of the present embodiment described later (step S301). Next, the information processing apparatus 300 acquires feature points and feature amounts from the reflection intensity image collected in step S200 using the above model (step S302).
The information processing apparatus 300 performs matching of the feature points between the reflection intensity image and the visible light image based on the feature points and the feature amounts acquired in step S301 and step S302 described above (step S303). For example, the information processing apparatus 300 calculates norms as feature amounts of the respective feature points, and matches the feature point having the smallest distance between images as a common feature point.
The flow shown in fig. 3 and 4 is an example of the information processing according to the present embodiment, and the information processing according to the present embodiment is not limited to this.
< 1.4 background >
Next, a description will be given of a background in which the present inventors have authored an embodiment of the present disclosure, with reference to fig. 5. Fig. 5 is an explanatory diagram illustrating the background of authoring the present embodiment.
As described above, as one of sensor fusion techniques using sensing data obtained from a plurality of sensors, a technique for accurately and easily aligning the plurality of sensors is further required. As such a technique, feature point matching can be performed between images acquired by the LiDAR100 and the camera 200.
For example, scale-Invariant Feature Transform (Scale invariant feature transform SIFT) is one of algorithms for feature point detection and feature quantity description, and features points are detected from differences using a smoothed image obtained by convolving Laplacian of Gaussian (laplacian of gaussian LoG) with Differences of Gaussian (gaussian difference DoG), and 128-dimensional gradient vectors obtained from pixel information around the detected feature points are described as feature quantities. In SIFT, the feature values can be described stably for rotation, scale change, illumination change, and the like of an image with respect to the detected feature points, and therefore, the SIFT can be used for matching of images such as image mosaic, object recognition, and detection. However, SIFT is a craftsman method composed of algorithms of a rule base considered by a person, and is relatively cumbersome.
"SuperPoint: self-supervised interest point detection and description (Self-supervision feature point detection and description) "is one of algorithms that make use of machine learning to perform feature point detection and feature quantity description. In Superpoint, for a certain image, a pair of an original image and an image on which random projection is applied is input as input data to a Deep Neural Network (DNN). Further, in the Superpoint, an algorithm (model) for matching feature points common to a plurality of images can be generated by learning feature points by a forward solution tag (teacher data) and learning feature quantities by calculating similar vectors between pixels corresponding to pairs of inter-image positions.
Such a conventional technique is premised on matching feature points between images obtained from the same type of sensor, and has robust features for projection such as enlargement, reduction, rotation, and the like. However, in the related art, in the feature point matching between images obtained from different sensors (different fields) such as the LiDAR100 and the camera 200, as shown in fig. 5, if the feature points (as shown by circles in the figure) that match between the reflected intensity image 400 and the visible light image 500 cannot be detected with high accuracy, or if the feature points that are common between the reflected intensity image 400 and the visible light image 500 cannot be matched, the accuracy is lowered.
Accordingly, the present inventors have devised embodiments of the present disclosure described below in view of such a situation. In an embodiment of the present disclosure authored by the present inventor, feature points and feature amounts common to a plurality of images (specifically, a reflected intensity image and a visible light image) obtained from different kinds of sensors are acquired through a Deep Neural Network (DNN), and a model (algorithm) for matching the common feature points is generated. In this case, DNN performs machine learning using not only a large number of reflected intensity images and visible light images but also images obtained by projecting these images as input data. According to the embodiments of the present disclosure, a model (algorithm) that can accurately and easily match feature points even with images obtained from different types of sensors can be obtained. Hereinafter, embodiments of the present disclosure devised by the present inventors will be described in detail.
Embodiment 2
< 2.1 information processing method >)
First, a description will be given of a general flow of processing for acquiring feature points and feature amounts from the reflection intensity image 400 and the visible light image 500 obtained from different sensors and generating a model (algorithm) for matching the common feature points. Although the description is made here of the case where the information processing apparatus 300 generates a model, the present embodiment may be performed by an information processing apparatus (not shown) different from the information processing apparatus 300, and is not particularly limited.
An information processing method according to an embodiment of the present disclosure and a processing method for generating a model will be described with reference to fig. 6. Fig. 6 is a flowchart illustrating an example of the information processing method according to the present embodiment. In detail, as shown in fig. 6, the information processing method of the present embodiment may mainly include a plurality of steps from step S500 to step S900. The following describes each of these steps in this embodiment in detail.
First, the information processing apparatus 300 collects one or more visible light images 500 from the camera 200 (step S500). Next, the information processing apparatus 300 collects one or more reflected intensity images 400 from the LiDAR (step S600).
Then, the information processing apparatus 300 generates a pair serving as input data using the visible light image 500 and the reflection intensity image 400 at the same viewpoint collected in step S500 and step S600 described above (step S700).
Next, the information processing apparatus 300 generates a forward-resolution label (teacher data) common to the visible light image 500 and the reflection intensity image 400 (step S800).
Then, the information processing apparatus 300 performs machine learning while randomly projecting the visible light image 500 and the reflection intensity image 400 (step S900).
The following describes in order the steps of generating the input data from step S700 to step S900, generating the forward-solving label, and learning.
< 2.2 Generation of input data >
The generation of input data according to the present embodiment will be described in detail with reference to fig. 7. Fig. 7 is an explanatory diagram illustrating an example of input data of the present embodiment. In the present embodiment, as described above, in step S700, a pair of the reflected intensity image 404 and the visible light image 504, which are input data, is generated. At this time, in the present embodiment, as shown in fig. 7, from the LiDAR100 and the camera 200, a reflected intensity panoramic image (first wide area image) 402 and a visible light panoramic image (second wide area image) 502, which are images of a wide area, are used.
In detail, the information processing apparatus 300 cuts out images from each of the reflection intensity panoramic image 402 and the visible light panoramic image 502 to be at the same position (the same viewpoint), the same size, and the same view angle (FOV). In this case, the information processing apparatus 300 may correct optical distortion or the like in the image. In this way, the information processing apparatus 300 can generate the input data 704 composed of the pair of the reflection intensity image 404 and the visible light image 504. According to the present embodiment, by generating input data by clipping from a panoramic image, a large number of pairs of the reflection intensity image 404 and the visible light image 504 with less offset can be easily generated.
There are cases where noise (in the figure, an image of a vehicle) caused by a moving subject exists in the reflection intensity panoramic image 402 and the visible light panoramic image 502, noise or the like which is a position lacking in matching caused by a difference in acquisition time between the reflection intensity panoramic image 402 and the visible light panoramic image 502. Therefore, in the present embodiment, in order not to take such noise as an object of machine learning, a mask image 602 including a mask covering noise portions of the reflected intensity panoramic image 402 and the visible light panoramic image 502 is generated. Then, in the present embodiment, the mask image 604 paired with the reflection intensity image 404 and the visible light image 504 included in the input data 704 is generated by cutting out the image from the generated mask image 602 to be the same position (the same viewpoint), the same size, and the same view angle (FOV). According to the present embodiment, by using such a mask, a position lacking in matching property is excluded from the object of machine learning, and thus the accuracy and efficiency of machine learning can be further improved.
Next, a detailed configuration of the information processing apparatus 300 according to the present embodiment will be described with reference to fig. 8 and 9. Fig. 8 is a block diagram showing an example of the configuration of the information processing apparatus 300 according to the present embodiment, and fig. 9 is an explanatory diagram showing an example of the generation of a mask according to the present embodiment. Here, the description will be focused on the function of the information processing apparatus 300 concerning the generation stage of the input data in the generation of the model. In detail, as shown in fig. 8, the information processing apparatus 300 mainly includes a reflected intensity image acquisition unit 322, a visible light image acquisition unit 324, a reflected intensity image processing unit (image processing unit) 326, a visible light image processing unit (image processing unit) 328, a mask generation unit (mask unit) 330, and an input data generation unit 332. The following describes each functional unit of the information processing apparatus 300 in detail.
(reflection intensity image acquiring section 322)
The reflected intensity image acquisition unit 322 acquires data of a reflected intensity panoramic image (first wide area image) 402 from the LiDAR100, and outputs the acquired data to a reflected intensity image processing unit 326 and a mask generation unit 330, which will be described later.
(visible light image acquisition section 324)
The visible light image acquisition unit 324 acquires data of a visible light panoramic image (second wide area image) 502 from the camera 200, and outputs the acquired data to a visible light image processing unit 328 and a mask generation unit 330, which will be described later.
(reflection intensity image processing section 326)
The reflected intensity image processing unit 326 cuts the reflected intensity panoramic image 402 from the reflected intensity image acquisition unit 322 into a predetermined position (viewpoint), a predetermined size, and a predetermined view angle (FOV), and generates a reflected intensity image 404 as input data 704. The reflected intensity image processing unit 326 outputs the generated reflected intensity image 404 to an input data generating unit 332 described later. The reflection intensity image processing unit 326 may perform optical distortion, brightness adjustment (gain adjustment), and contrast adjustment (gamma adjustment) in the image.
(visible light image processing section 328)
The visible light image processing unit 328 cuts the visible light panoramic image 502 from the visible light image acquisition unit 324 into a predetermined position (viewpoint), a predetermined size, and a predetermined view angle (FOV), and generates a visible light image 504 as input data 704. The visible light image processing unit 328 outputs the generated visible light image 504 to an input data generating unit 332 described below. The visible light image processing unit 328 may perform optical distortion, brightness adjustment (gain adjustment), and contrast adjustment (gamma adjustment) in the image.
(mask generating section 330)
In the present embodiment, the mask image 602 is automatically generated by a convolutional neural network (Convolutional Neural Network:cnn). As described above, according to the present embodiment, the mask images 602 and 604 can be easily and largely generated, and further the input data 704 can be generated. Specifically, as shown in fig. 9, the mask generating unit 330 is configured by CNN or the like, and generates a mask image 602 using the aligned (aligned) reflection intensity panoramic image 402 and the visible light panoramic image 502 as input data. The mask generating unit 330 then cuts the generated mask image 602 into a predetermined position (viewpoint), a predetermined size, and a predetermined view angle (FOV), generates a mask image 604 serving as input data 704, and outputs the mask image 604 to the input data generating unit 332 described later. For example, the CNN330 can capture a subject as a block of a BOX, represent the subject with the position coordinates of the center point of the BOX and the image feature amount thereof, and generate the mask image 602 by using a subject detection algorithm such as "Objects as Points" for subject recognition. In this way, in the present embodiment, a mask for excluding a position lacking matching from the machine-learned object can be automatically generated.
(input data generating section 332)
The input data generation unit 332 outputs the reflection intensity image 404, the visible light image 504, and the mask image 604, which are output from the reflection intensity image processing unit 326, the visible light image processing unit 328, and the mask generation unit 330 described above, at the same position (the same viewpoint), at the same size, and at the same viewing angle (FOV), as a set (pair) of input data 704 to a functional unit (specifically, the reflection intensity image acquisition units 342 and 362 and the visible light image acquisition units 344 and 364 shown in fig. 10 and 13) described below. In the present embodiment, when there is no noise in the reflection intensity image 404 and the visible light image 504, the mask image 604 may not be included in the group of input data.
In the present embodiment, the functional blocks of the information processing apparatus 300 related to the generation stage of the input data 704 during the generation of the model are not limited to the configuration shown in fig. 8.
< 2.3 Generation of forward solution tags >)
Next, the generation of the forward-solution tag (teacher data) of the present embodiment will be described in detail. There are tens to hundreds of feature points in one image. Therefore, when generating a positive solution tag for machine learning, it is not realistic to manually detect feature points that become positive solution tags one by one. Therefore, in the present embodiment, the forward label is automatically generated using DNN or the like.
First, a detailed configuration of the information processing apparatus 300 according to the present embodiment will be described with reference to fig. 10. Fig. 10 is a block diagram showing an example of the configuration of an information processing apparatus 300 according to the present embodiment. Here, the description will be focused on the function of the information processing apparatus 300 regarding the generation stage of the forward solution tag (teacher data) in the generation of the model. Specifically, as shown in fig. 10, the information processing apparatus 300 mainly includes a reflected intensity image acquisition unit 342, a visible light image acquisition unit 344, a reflected intensity image projection unit 346, a visible light image projection unit 348, and a positive solution tag generation unit (teacher data generation unit) 350. The following describes each functional unit of the information processing apparatus 300 in detail.
(reflection intensity image acquiring section 342)
The reflected intensity image acquisition unit 342 acquires the reflected intensity image 404 and the mask image 604 from the input data generation unit 332 of fig. 8, and outputs the images to a reflected intensity image projection unit 346 described later. In the present embodiment, the reflection intensity image acquisition unit 342 may not acquire and output the mask image 604 when the reflection intensity image 404 does not have noise.
(visible light image acquisition section 344)
The visible light image acquisition unit 344 acquires the visible light image 504 and the mask image 604 from the input data generation unit 332 of fig. 8, and outputs the images to a visible light image projection unit 358 described later. In the present embodiment, the visible light image acquisition unit 344 may not acquire and output the mask image 604 when the visible light image 504 is free from noise.
(reflection intensity image projection section 346)
The reflected intensity image projection unit 346 projects the acquired reflected intensity image 404 (mask image 604, if necessary) by randomly rotating or shifting the viewpoint horizontally, vertically, or obliquely. For example, the reflection intensity image projection unit 346 can perform projection by a homography matrix H given randomly. Then, the reflected intensity image projection unit 346 outputs the projected reflected intensity image (first projected image) obtained by the projection to the forward-solution tag generation unit 350 described later together with the reflected intensity image 404.
(visible light image projecting section 348)
The visible light image projection unit 348 projects the acquired visible light image 504 (mask image 604, if necessary) by randomly rotating the image or shifting the viewpoint horizontally, vertically, or obliquely. For example, the visible light image projection unit 348 can project the visible light image by using a homography matrix H which is randomly given. Then, the visible light image projection unit 348 outputs the projected visible light image (second projected image) obtained by the projection to the positive solution tag generation unit 350 described later together with the visible light image 504.
(Positive solution Label Generation section 350)
The positive solution tag generation unit 350 generates a positive solution tag (teacher data) to be used in a learning unit 370 (see fig. 13) described later. For example, the forward label generating unit 350 detects feature points of the reflected intensity image 404 and the visible light image 504 using the projected reflected intensity image 404 and the projected visible light image 504, and acquires likelihood maps (maps obtained by plotting accuracy of each feature point and the feature point) of each feature point. Then, the forward label generation section 350 generates a forward label for the reflection intensity image and a forward label for the visible light image by combining the likelihood maps. In the present embodiment, the positive-solution-label generating unit 350 is configured by, for example, an encoder (not shown) that performs dimensional compression on input data and a detector (not shown) that detects feature points.
In the present embodiment, the functional blocks of the information processing apparatus 300 related to the generation stage of the forward-solution tag in the generation of the model are not limited to the configuration shown in fig. 10.
Next, the generation of the positive solution label according to the present embodiment will be described in detail with reference to fig. 11 and 12. Fig. 11 and 12 are explanatory diagrams illustrating an example of generation of a positive solution label according to the present embodiment.
In the present embodiment, as shown in fig. 11, the forward-solution tag generation unit 350 performs machine learning in advance using CG (Computer Graphics: computer graphics) images 700 prepared in advance, and generates a forward-solution tag 800. Then, the information processing apparatus 300 compares the generated forward-solved label 800 with the forward-solved label 900 including the feature points of the CG image 700 manually generated in advance, and feeds back the difference (detector loss) thereof to the forward-solved label generating section 350, and performs reinforcement learning so that the difference becomes smaller.
However, in the algorithm (model) thus obtained (the forward label generating section 350), there is a space between the CG image 700 and an actual image (a reflected intensity image, a visible light image) actually used, so it is difficult to robustly detect a feature point (for example, a feature point to be detected cannot be detected) for an image in which the viewing direction (viewpoint position) is changed. Therefore, in the present embodiment, the forward label generating unit 350 can detect the feature points in a robust manner by applying random projections to the reflection intensity images and the visible light images and performing machine learning using the projected images. In detail, in the present embodiment, the forward label generating unit 350 also detects feature points from the projected images by applying random projections to the reflection intensity images and the visible light images, thereby obtaining the probability (possibility) of detecting the feature points. Next, in the present embodiment, a likelihood map obtained by diagramming the possibilities of the respective feature points of the reflection intensity image and the visible light image is combined to generate a forward-solved label common to the reflection intensity image and the visible light image. In the present embodiment, a model (algorithm) capable of stably detecting feature points from both the reflection intensity image and the visible light image can be obtained by using the forward-solution tag common to the reflection intensity image and the visible light image in a learning stage described later.
More specifically, as shown in fig. 12, the forward label generating unit 350, which performs machine learning based on the CG image 700, generates a likelihood map 802 including feature points and the likelihood of the feature points, based on the reflection intensity image 406 and the projected reflection intensity image 410. Next, the forward label generating unit 350 generates a likelihood map 802 including feature points and the likelihood of the feature points, based on the visible light image 506 and the projected visible light image 510. The forward-solution tag generation unit 350 generates a forward-solution tag 904 for the reflection intensity image and a forward-solution tag 904 for the visible light image by combining the two likelihood maps. In addition, the positive solution tag generation unit 350 repeats the above-described machine learning by using the generated positive solution tag 904, thereby obtaining a final positive solution tag 904. The generation of the forward-solved label 904 according to the present embodiment is similar to the technique described in the above-described non-patent document 1, but is different in that the forward-solved label 904 can be generated so that a characteristic point common to a reflection intensity image and a visible light image obtained from different sensors (different fields) can be detected with robustness even if the observation direction (viewpoint) is changed.
< 2.4 learning >)
Next, the generation of the learning-based model according to the present embodiment will be described in detail. Here, a model (algorithm) for robustly performing matching of common feature points or feature points in a reflection intensity image and a visible light image obtained from different sensors (different fields) even if the observation direction (viewpoint) changes is generated by machine learning.
First, a detailed configuration of the information processing apparatus 300 according to the present embodiment will be described with reference to fig. 13 and 14. Fig. 13 is a block diagram showing an example of the configuration of an information processing apparatus 300 according to the embodiment of the present disclosure, and fig. 14 is a block diagram showing an example of the configuration of a learning unit 370 according to the embodiment. Here, the description will be focused on the function of the information processing apparatus 300 for generating a model by learning. In detail, as shown in fig. 13, the information processing apparatus 300 mainly includes a reflected-light-intensity image acquisition unit 362, a visible-light-image acquisition unit 364, a reflected-light-intensity image projection unit 366, a visible-light-image projection unit 368, and a learning unit (learner) 370. The following describes each functional unit of the information processing apparatus 300 in detail.
(reflected intensity image acquisition section 362)
The reflected intensity image acquisition unit 362 acquires the reflected intensity image 404 and the mask image 604 from the input data generation unit 332 of fig. 8, and outputs the images to a reflected intensity image projection unit 366 described later. In the present embodiment, the reflection intensity image acquisition unit 362 may not acquire and output the mask image 604 when the reflection intensity image 404 does not have noise.
(visible light image acquisition section 364)
The visible light image acquisition unit 364 acquires the visible light image 504 and the mask image 604 from the input data generation unit 332 of fig. 8, and outputs the images to a visible light image projection unit 368 described later. In the present embodiment, the visible light image acquisition unit 364 may not acquire and output the mask image 604 when the visible light image 504 is free of noise.
(reflection intensity image projection unit 366)
The reflected intensity image projection unit 366 projects the acquired reflected intensity image 404 (mask image 604 if necessary). For example, the reflection intensity image projection unit 366 can project by a homography matrix H given at random. Then, the reflected intensity image projection unit 366 outputs the projected reflected intensity image (first projected image) 410 obtained by the projection to a learning unit 370 described later together with the reflected intensity image 404.
(visible light image projecting section 368)
The visible light image projection unit 368 projects the acquired visible light image 504 (mask image 604 if necessary). For example, the visible light image projection unit 368 can project a visible light image by using a homography matrix H given at random. Then, the visible light image projection unit 368 outputs the projected visible light image (second projection image) 510 obtained by the projection to the learning unit 370 described later together with the visible light image 504.
(learning section 370)
The learning unit 370 acquires feature points and feature amounts from the reflected intensity image 404 and the visible light image 504, and generates a model (algorithm) for matching the common feature points. In detail, for example, the pair of the reflected intensity image 404 and the input data 704 of the projected visible light image 510 and/or the pair of the visible light image 504 and the input data 704 of the projected reflected intensity image 410 are input to the learning unit 370. Alternatively, for example, the pair of the input data 704 of the visible light image 504 and the projected visible light image 510 and the pair of the input data 704 of the visible light image 504 and the projected reflection intensity image 410 may be input to the learning unit 370. Alternatively, for example, the pair of the input data 704 of the reflected intensity image 404 and the projected reflected intensity image 410 and the pair of the input data 704 of the reflected intensity image 404 and the projected visible light image 510 may be input to the learning unit 370. The learning unit 370 may be inputted with a pair of the input data 704 of the visible light image 504 and the projected reflection intensity image 410, a pair of the input data of the reflection intensity image 404 and the projected visible light image 510, a pair of the input data 704 of the visible light image 504 and the projected visible light image 510, and a pair of the input data 704 of the reflection intensity image 404 and the projected reflection intensity image 410. That is, in the present embodiment, a pair including input data of two images from sensors of different types is input. The learning unit 370 can generate a model for performing machine learning using the input data, so as to perform matching of common feature points in the reflection intensity image and the visible light image obtained from the different types of sensors even if the observation direction changes.
More specifically, as shown in fig. 14, the learning unit 370 includes an encoder unit 372 that performs dimensional compression (for example, 1/8) on a pair of input data 704, a detector unit 374 that detects feature points (positions and coordinate information thereof that show features such as the shape of the subject, such as a center point, a branch point, an intersection point, and an end point on a contour in an image) from the pair of compressed input data 704, and a descriptor unit (feature amount acquisition unit) 376 that acquires (describes) feature amounts (can numerically show features of the feature points, such as information such as the shape, direction, and expansion of the feature points) from the pair of compressed input data 704. The learning unit 370 performs machine learning by matching common feature points of images from different sensors based on the feature amounts, or comparing feature points and feature amounts acquired from the respective images with the forward-solution tag (teacher data) 804, and feeding back the comparison result to the learning unit 370.
In the present embodiment, the functional blocks of the information processing apparatus 300 related to the generation stage of the model are not limited to the configuration shown in fig. 13 and 14.
Example 1
Next, an embodiment of specific machine learning by the learning unit 370 will be described with reference to fig. 15. Fig. 15 is an explanatory diagram illustrating an example of learning of the present embodiment.
In this example, for example, the learning unit 370 is input with a pair of the reflected intensity image 404 and the input data 704 of the projected visible light image 510 and/or a pair of the visible light image 504 and the input data 704 of the projected reflected intensity image 410. Alternatively, for example, the pair of the input data 704 of the visible light image 504 and the projected visible light image 510 and the pair of the input data 704 of the visible light image 504 and the projected reflection intensity image 410 may be input to the learning unit 370. Alternatively, for example, the pair of the input data 704 of the reflected intensity image 404 and the projected reflected intensity image 410 and the pair of the input data 704 of the reflected intensity image 404 and the projected visible light image 510 may be input to the learning unit 370. The learning unit 370 may be inputted with a pair of the input data 704 of the visible light image 504 and the projected reflection intensity image 410, a pair of the input data of the reflection intensity image 404 and the projected visible light image 510, a pair of the input data 704 of the visible light image 504 and the projected visible light image 510, and a pair of the input data 704 of the reflection intensity image 404 and the projected reflection intensity image 410.
More specifically, in the example shown in fig. 15, a pair of input data 710a of the reflected intensity image 406 and the projected visible light image 510, and a pair of input data 710b of the visible light image 506 and the projected reflected intensity image 410 are input.
In the present embodiment, the learning unit 370 including two sets of the encoder unit 372, the detector unit 374, and the descriptor unit 376, which are identical in weight, is prepared, and the feature points and the feature amounts are acquired from the pair of input data 710a and 710 b. Specifically, the learning unit 370 compares the result data 810a and 810b including the feature points acquired by the detector units 374 with the above-described forward label 804, and calculates a loss (detector loss) L, which is a difference between the result data and the forward labelp . In the present embodiment, each learning unit 370 generates a result based on the feature value obtained from each descriptor unit 376Data 812, matching and comparing feature points, and calculating a difference between the feature points, i.e., a loss (descriptor loss) Ld 。
Here, when the likelihood maps of the feature points from the paired images and the projected image are set to χ and χ 'and the feature amounts from the images and the projected image are set to D, D', the final loss value L can be shown by the following equation (1) using the constant λ.
[ formula 1]
L(χ,χ′,D,D′,Y,Y′,s)=Lp (χ,Y)+Lp (χ′,Y′)+λLd (D,D′,s)…(1)
In formula (1), Y represents the forward label 804 of the feature point, and s represents the correspondence of pixels between the two images.
In addition, the loss (detector loss) L of the detector portion 374 is calculated by cross entropy with the forward solution label 804p This can be shown by the following formula (2). The projection image is an image projected by the homography matrix H given at random.
[ formula 2]
In addition, the hinge loss can be used to input the feature quantity d of each pixel of the imagehw (dhw Is the source of set D), and the feature quantity D 'of the projected image of the input image'hw (d′hw Is the source of set D'), the loss (descriptor loss) L of the descriptor section 376 is shown by the following equation (3)d . In addition, in the formula (3), the positive edge is set to mp Let the negative edge be mn ,λd A constant that balances the positive correspondence and the false correspondence is obtained. At this time, the correspondence relation (match) s is defined by the following equation (4).
[ formula 3]
ld (d,d′;s)=λd *s*max(0,mp -dT d′)+(1-s)*max(0,dT d′-mn ) …(3)
[ equation 4]
Furthermore, phw Is the pixel position on the image corresponding to the feature quantity of the descriptor section 376. In addition, Hphw Is the pixel location that is warped by the homography matrix H. Further, since the feature amount of the descriptor portion 376 is compressed to 1/8 with respect to the input image, it is regarded as a pixel corresponding to a case where the distance of the corresponding pixel is within 8 pix.
As described above, the learning unit 370 calculates the final loss L and performs feedback so as to minimize L, thereby generating a model (algorithm) that can stably perform matching of the reflection intensity image obtained from the different sensor (different field) and the common feature point or feature point in the visible light image even when the observation direction (viewpoint) changes.
Example 2
Further, another embodiment of specific machine learning by the learning unit 370 will be described with reference to fig. 16. Fig. 16 is an explanatory diagram illustrating an example of learning of the present embodiment.
In the present embodiment, a Shared Encoder (E) shown in fig. 17s ) The same function as the encoder 372 described above is provided. In this embodiment, a Private Encoder (Ep) for an image from a reflection intensity image and a Private Encoder (Ep) for an image from a visible light image are prepared (the first Encoder section). In this embodiment, a Shared Decoder (Shared Decoder) is prepared in which the output obtained by adding the Shared Encoder and the output of the Private Encoder is used as input data.
More specifically, in the example shown in fig. 16, a pair of the input data 712a of the reflected intensity image 406 and the projected visible light image 510, and a pair of the input data 712b of the visible light image 506 and the projected reflected intensity image 410 are input.
The value L of the final loss in this embodiment is calculated from five loss functions (Lp 、Ld 、Lr 、Lf 、Ls ) Is a total of the above components. L in five loss functionsp 、Ld The same as in example 1 described above. In addition, loss Lr Is for outputting (Es (I) With the output of the Private Encoder (E)p (I) The output image I and the input image are compared, and the same Reconstruction loss (Reconstruction loss) is obtained. In addition, loss Lf Is to make the output of the Private Encoder (Ep (I) With the output of the Shared Encoder (E)s (I) Different Difference losses (Difference losses). And, loss Ls The Similarity loss (Similarity loss) is a factor that makes the output of the Shared Encoder ambiguous as to which of the visible light image and the reflection intensity image is output.
Also, five loss functions (Lp 、Ld 、Lr 、Lf 、Ls ) And the constants α, β, γ define the final loss value L as in the following equation (5).
[ equation 5]
L=Lp +λLd +αLr +βLf +γLs …(5)
Reconstruction loss L is defined by the following equation (6)r So that the output of the Shared Decoder coincides with the input image.
[ formula 6]
In the formula (6), k is the number of pixels of the input image, 1k Is a vector of element 1 and length k. In the formula (6), the position expressed by the norm refers to the square value of the L2 norm.
Is determined by the following formula (7)Sense Difference loss Lf So that the output of the Private Encoder (Ep (I) With the output of the Shared Encoder (E)s (I) A) are different.
[ formula 7]
In the formula (7), the position expressed by the norm refers to a square value of the Frobenius norm (freuden Luo Beini us norm).
Similarity loss Ls Is a loss for learning so that the output of the Shared Encoder is ambiguous as to which of the visible light image and the reflection intensity image is output. In this embodiment, gradient Reversal Layer (gradient inversion layer, GRL) is used in order to maximize the above mixing. The GRL outputs the same as a function, but the gradient direction is reversed. Therefore, GRL is defined by the following formula (8).
[ formula 8]
The output E p (I) of the Shared Encoder is input to the domain identifier Z (Q (Ep (I);θz ) D (caret is added to d) and distinguishes between a visible light image and a reflected intensity image. At this time, θz Is a parameter of the domain identifier Z, and d (with a caret appended to d) is the source of the set {0,1 }. During learning, through GRL, θz In contrast to the enhancement of the discrimination capability of the domain identifier Z, the parameter of Shared encoding is learned by reversing the gradient so that the discrimination capability of the domain identifier is reduced. Thus, the Similarity loss Ls is defined by the following formula (9).
[ formula 9]
As described above, the learning unit 370 calculates the final loss L and performs feedback so as to minimize L, thereby generating a model (algorithm) that can stably perform matching of the reflection intensity image obtained from the different sensor (different field) and the common feature point or feature point in the visible light image even when the observation direction (viewpoint) changes.
Summary of 3
As described above, according to the embodiments of the present disclosure, feature point matching between images obtained from sensors (fields) of different types can be performed with good accuracy. As a result, according to the present embodiment, the information of the plurality of sensors can be accurately and easily aligned. Specifically, according to the present embodiment, the external parameters (positional parameters) and/or internal parameters (optical parameters) of the LiDAR100 and the camera 200 can be accurately corrected based on the difference between the positions where the LiDAR100 and the camera 200 are arranged (parallax, distance to the subject), the difference in the angle of view between the LiDAR100 and the camera 200, or lens aberrations, so that the difference (deviation) in the positional information in the images output from the LiDAR100 and the camera 200 can be eliminated. The matching of feature points based on the model (algorithm) obtained in the present embodiment is not limited to the calibration (alignment) using a plurality of sensors, and can be applied to distortion (a technique of newly generating an image between two images that are continuous in time series by a computer pattern) and the like.
The present embodiment is not limited to the combination of the LiDAR100 and the camera 200, and may be applied to, for example, a combination of other image sensors that observe a light-generated image of a specific wavelength. That is, the present embodiment can be applied to any sensor of a different type without any particular limitation.
Hardware architecture 4
For example, the information processing apparatus 300 according to each of the above embodiments may be realized by a computer 1000 having a configuration as shown in fig. 17, which is connected to the LiDAR100 and the camera 200 via a network. Fig. 17 is a hardware configuration diagram showing an example of a computer that realizes the functions of the information processing apparatus 300. The computer 1000 has a CPU1100, a RAM1200, a ROM (Read Only Memory) 1300, an HDD (Hard Disk Drive) 1400, a communication interface 1500, and an input/output interface 1600. The various components of computer 1000 are connected by a bus 1050.
The CPU1100 operates based on a program stored in the ROM1300 or the HDD1400, and controls each unit. For example, the CPU1100 expands programs stored in the ROM1300 or the HDD1400 in the RAM1200, and executes processing corresponding to various programs.
ROM1300 stores a boot program such as BIOS (Basic Input Output System: basic input output System) executed by CPU1100 at the time of starting up computer 1000, a program depending on the hardware of computer 1000, and the like.
The HDD1400 is a computer-readable recording medium that temporarily stores a program executed by the CPU1100, data used by the program, and the like. Specifically, HDD1400 is a recording medium that records a ranging program of the present disclosure as one example of program data 1450.
The communication interface 1500 is an interface for connecting the computer 1000 to an external network 1550 (for example, the internet). For example, the CPU1100 receives data from other devices or transmits data generated by the CPU1100 to other devices via the communication interface 1500.
The input/output interface 1600 is an interface for connecting the input/output device 1650 to the computer 1000. For example, the CPU1100 receives data from an input device such as a keyboard, a mouse, or the like via the input-output interface 1600. The CPU1100 also transmits data to an output device such as a display, a speaker, and a printer via the input/output interface 1600. The input/output interface 1600 may function as a medium interface for reading a program or the like recorded on a predetermined recording medium (medium). Examples of the medium include an Optical recording medium such as a DVD (Digital Versatile Disc: digital versatile disc), a PD (Phase change rewritable Disk: phase change erasable Optical disc), an Magneto-Optical recording medium such as an MO (Magneto-Optical disc), a magnetic tape medium, a magnetic recording medium, and a semiconductor memory.
For example, in the case where the computer 1000 functions as the information processing apparatus 300 according to the embodiment of the present disclosure, the CPU1100 of the computer 1000 realizes the functions of the learning unit 370 and the like by executing programs and models loaded on the RAM 1200. In addition, a program or the like of the embodiment of the present disclosure is stored in the HDD 1400. Further, although the CPU1100 reads the program data 1450 from the HDD1400 and executes the program data, as another example, the program may be acquired from another device via the external network 1550.
The information processing apparatus 300 according to the present embodiment is applicable to a system including a plurality of apparatuses on the premise of connection to a network (or communication between the apparatuses), such as cloud computing.
Application example 5
An example of a mobile device control system to which the techniques set forth in this disclosure can be applied is described with reference to fig. 18. Fig. 18 is a block diagram showing a configuration example of a vehicle control system 11 as an example of a mobile device control system to which the present technology is applied.
The vehicle control system 11 is provided in the vehicle 1, and performs processing related to driving assistance and automatic driving of the vehicle 1.
The vehicle control system 11 includes a vehicle control ECU (Electronic Control Unit: electronic control unit) 21, a communication unit 22, a map information storage unit 23, a position information acquisition unit 24, an external recognition sensor 25, an in-vehicle sensor 26, a vehicle sensor 27, a storage unit 28, a travel assist/autopilot control unit 29, a DMS (Driver Monitoring System: driver monitoring system) 30, an HMI (Human Machine Interface: human-machine interface) 31, and a vehicle control unit 32.
The vehicle control ECU21, the communication unit 22, the map information storage unit 23, the positional information acquisition unit 24, the external recognition sensor 25, the in-vehicle sensor 26, the vehicle sensor 27, the storage unit 28, the travel assist/autopilot control unit 29, the Driver Monitoring System (DMS) 30, the human-machine interface (HMI) 31, and the vehicle control unit 32 are communicably connected to each other via the communication network 41. The communication network 41 is constituted by, for example, an in-vehicle communication network or a bus in accordance with a digital bidirectional communication standard such as CAN (Controller Area Network: controller area network), LIN (Local Interconnect Network: local area network), LAN (Local Area Network: local area network), flexRay (registered trademark), ethernet (registered trademark), or the like. The communication network 41 may be used separately according to the kind of data to be transmitted. For example, CAN may be applied to data related to vehicle control, and ethernet may be applied to large-capacity data. The respective units of the vehicle control system 11 may be directly connected not via the communication network 41, but using wireless communication that assumes relatively close-range communication such as near field communication (NFC (Near Field Communication: near field communication)) or Bluetooth (registered trademark), for example.
Note that, in the following, when the respective units of the vehicle control system 11 communicate via the communication network 41, description of the communication network 41 is omitted. For example, in the case where the vehicle control ECU21 and the communication unit 22 communicate via the communication network 41, it is described that only the vehicle control ECU21 communicates with the communication unit 22.
The vehicle control ECU21 is configured by various processors such as a CPU (Central Processing Unit: central processing unit) and an MPU (Micro Processing Unit: microprocessor), for example. The vehicle control ECU21 can control the functions of the entire or a part of the vehicle control system 11.
The communication unit 22 can communicate with various devices inside and outside the vehicle, other vehicles, servers, base stations, and the like, and can transmit and receive various data. In this case, the communication unit 22 may perform communication using a plurality of communication methods.
Here, communication with the outside of the vehicle that can be performed by the communication unit 22 will be schematically described. The communication unit 22 can communicate with a server existing on an external network (hereinafter referred to as an external server) or the like via a base station or an access point by a wireless communication system such as 5G (fifth generation mobile communication system), LTE (Long Term Evolution: long term evolution), DSRC (Dedicated Short Range Communications: dedicated short range communication), or the like. The external network through which the communication unit 22 communicates is, for example, the internet, a cloud network, or a network unique to an enterprise. The communication method by the communication unit 22 for the external network is not particularly limited as long as it is a wireless communication method capable of performing digital bidirectional communication at a predetermined or higher communication speed and at a predetermined or higher distance.
The communication unit 22 can communicate with a terminal existing in the vicinity of the host vehicle using, for example, P2P (Peer To Peer) technology. Terminals existing in the vicinity of the host vehicle include, for example, terminals to which a mobile body such as a pedestrian or a bicycle is attached, terminals provided in a stationary manner in a store or the like, or MTC (Machine Type Communication: machine type communication) terminals. The communication unit 22 can also perform V2X communication. The V2X communication is, for example, communication between the host Vehicle and other vehicles (Vehicle to Vehicle), communication between the host Vehicle and road-side devices (Vehicle to Infrastructure), communication between the host Vehicle and Home (Vehicle to Home), and communication between the host Vehicle and terminals held by pedestrians (Vehicle to Pedestrian), and other communication between the host Vehicle and other vehicles.
The communication unit 22 can receive, for example, a program (Over The Air) for updating software for controlling The operation of The vehicle control system 11 from The outside. The communication unit 22 can receive map information, traffic information, information on the surroundings of the vehicle 1, and the like from the outside. In addition, for example, the communication unit 22 can transmit information related to the vehicle 1, information on the surroundings of the vehicle 1, and the like to the outside. As the information related to the vehicle 1 transmitted to the outside by the communication unit 22, for example, data indicating the state of the vehicle 1, the recognition result by the recognition unit 73, and the like can be cited. The communication unit 22 can also perform communication corresponding to a vehicle emergency notification system such as an E-call (E-call), for example.
For example, the communication unit 22 may receive electromagnetic waves transmitted by a road traffic information communication system (VICS (Vehicle Information and Communication System) (registered trademark)) such as a radio beacon, an optical beacon, and FM multiplex broadcasting.
Communication with the inside of the vehicle that can be executed by the communication unit 22 will be schematically described. The communication unit 22 can communicate with each device in the vehicle using wireless communication, for example. The communication unit 22 can perform Wireless communication with devices in the vehicle by a communication system capable of performing digital two-way communication at a predetermined communication speed or more by Wireless communication, such as Wireless LAN, bluetooth (registered trademark), NFC, WUSB (Wireless USB), and the like. The communication unit 22 is not limited to this, and may communicate with each device in the vehicle using wired communication. For example, the communication unit 22 can communicate with each device in the vehicle by wired communication via a cable connected to a connection terminal, not shown. The communication unit 22 can communicate with each device in the vehicle by a communication system such as USB (Universal Serial Bus: universal serial bus), HDMI (High-Definition Multimedia Interface: high-definition multimedia interface) (registered trademark), MHL (Mobile High-definition Link) or the like, which can perform digital bidirectional communication at a predetermined communication speed or more by wired communication.
Here, the in-vehicle device refers to, for example, a device in the vehicle that is not connected to the communication network 41. As the devices in the vehicle, for example, a mobile device held by a passenger such as a driver, a wearable device, an information device brought into the vehicle and temporarily set, and the like are assumed.
The map information storage unit 23 can store one or both of a map acquired from the outside and a map created in the vehicle 1. For example, the map information storage unit 23 stores a three-dimensional high-precision map, a global map having a lower precision than the high-precision map and covering a wider area, and the like.
Examples of the high-precision map include a dynamic map, a point cloud map, and a vector map. The dynamic map is a map composed of four layers of dynamic information, quasi-static information, and is provided to the vehicle 1 from an external server or the like. The point cloud map is a map composed of point clouds (point group data). The vector map is illustrated by associating traffic information such as the positions of lanes and traffic lights with a point cloud map, and is suitable for an ADAS (Advanced Driver Assistance System: advanced driving support system) and AD (Autonomous Driving: automatic driving) map.
The point cloud map and the vector map may be provided from an external server or the like, or may be created in the vehicle 1 as a map for matching with a local map described later based on the sensing results of the camera 51, the radar 52, the LiDAR53, or the like, and stored in the map information storage unit 23. In addition, in the case of providing a high-precision map from an external server or the like, map data of, for example, several hundred meters square, relating to a planned route along which the vehicle 1 is traveling after that, is acquired from the external server or the like in order to reduce the communication capacity.
The position information acquiring unit 24 can receive GNSS signals from GNSS (Global Navigation Satellite System: global navigation satellite system) satellites and acquire position information of the vehicle 1. The acquired position information is supplied to the travel support/automatic driving control unit 29. The position information acquiring unit 24 is not limited to the type using GNSS signals, and may acquire position information using beacons, for example.
The external recognition sensor 25 includes various sensors used for recognizing the external condition of the vehicle 1, and can supply sensor data from the respective sensors to the respective sections of the vehicle control system 11. The kind and number of sensors included in the external recognition sensor 25 are not particularly limited.
For example, the external recognition sensor 25 has a camera 51, a radar 52, liDAR (Light Detection and Ranging, laser Imaging Detection and Ranging: light detection and ranging, laser imaging detection and ranging) 53, and an ultrasonic sensor 54. The external recognition sensor 25 is not limited to this, and may be configured to have one or more of the camera 51, the radar 52, the LiDAR53, and the ultrasonic sensor 54. The number of cameras 51, radar 52, liDAR53, and ultrasonic sensor 54 is not particularly limited as long as they can be installed in the vehicle 1. The type of the sensor provided in the external recognition sensor 25 is not limited to this example, and the external recognition sensor 25 may have another type of sensor. An example of the sensing area of each sensor included in the external recognition sensor 25 will be described later.
The imaging mode of the camera 51 is not particularly limited. For example, as necessary, cameras of various photographing modes such as a ToF (Time of flight) camera, a stereoscopic camera, a monocular camera, and an infrared camera, which are photographing modes capable of ranging, can be applied to the camera 51. The camera 51 is not limited to this, and may be used to acquire only a photographed image regardless of distance measurement.
In addition, the external recognition sensor 25 can have an environment sensor for detecting the environment for the vehicle 1, for example. The environmental sensor is a sensor for detecting the environment such as weather, and brightness, and may include, for example, a raindrop sensor, a fog sensor, a sun sensor, a snow sensor, and an illuminance sensor.
The external recognition sensor 25 includes, for example, a microphone for detecting the sound around the vehicle 1, the position of the sound source, and the like.
The in-vehicle sensor 26 has various sensors for detecting information in the vehicle, and can supply sensor data from the respective sensors to the respective sections of the vehicle control system 11. The type and number of the various sensors included in the in-vehicle sensor 26 are not particularly limited as long as they can be installed in the vehicle 1.
For example, the in-vehicle sensor 26 may include one or more of a camera, a radar, a seating sensor, a steering wheel sensor, a microphone, and a biometric sensor. As the camera provided in the in-vehicle sensor 26, for example, cameras of various photographing modes capable of ranging, such as a ToF camera, a stereo camera, a monocular camera, and an infrared camera, are used. The present invention is not limited thereto, and the camera provided in the in-vehicle sensor 26 may be used only to acquire a captured image regardless of distance measurement. The in-vehicle sensor 26 includes a biometric sensor provided in, for example, a seat, a steering wheel, or the like, and detects various biometric information of a passenger such as a driver.
The vehicle sensor 27 has various sensors for detecting the state of the vehicle 1, and can supply sensor data from the respective sensors to the respective sections of the vehicle control system 11. The types and the number of the various sensors provided in the vehicle sensor 27 are not particularly limited as long as they can be installed in the vehicle 1.
For example, the vehicle sensor 27 can have a speed sensor, an acceleration sensor, an angular velocity sensor (gyro sensor), and an inertial measurement device (IMU (Inertial Measurement Unit)) that integrates them. For example, the vehicle sensor 27 has a steering angle sensor that detects a steering angle of a steering wheel, a yaw rate sensor, an accelerator sensor that detects an operation amount of an accelerator pedal, and a brake sensor that detects an operation amount of a brake pedal. For example, the vehicle sensor 27 includes a rotation sensor that detects the rotational speeds of the engine and the motor, an air pressure sensor that detects the air pressure of the tire, a slip ratio sensor that detects the slip ratio of the tire, and a wheel speed sensor that detects the rotational speed of the wheel. For example, the vehicle sensor 27 has a battery sensor that detects the remaining amount and temperature of the battery and an impact sensor that detects an impact from the outside.
The storage unit 28 includes at least one of a nonvolatile storage medium and a volatile storage medium, and is capable of storing data and programs. The storage unit 28 is used as, for example, an EEPROM (Electrically Erasable Programmable Read Only Memory: electrically erasable programmable read only memory) and a RAM (Random Access Memory: random access memory), and as a storage medium, a magnetic storage device such as an HDD (Hard disk Drive), a semiconductor storage device, an optical storage device, and an magneto-optical storage device can be applied. The storage unit 28 stores various programs and data used by each unit of the vehicle control system 11. For example, the storage unit 28 includes EDR (Event Data Recorder: event data recorder) and DSSAD (Data Storage System for Automated Driving: automated driving data recording system), and stores information of the vehicle 1 before and after an event such as an accident and information acquired by the in-vehicle sensor 26.
The travel support/automatic driving control unit 29 can perform travel support and automatic driving control of the vehicle 1. For example, the travel support/automatic driving control unit 29 includes an analysis unit 61, an action planning unit 62, and an action control unit 63.
The analysis unit 61 can perform analysis processing of the conditions of the vehicle 1 and the surrounding area. The analysis unit 61 includes a self-position estimation unit 71, a sensor fusion unit 72, and a recognition unit 73.
The self-position estimating unit 71 can estimate the self-position of the vehicle 1 based on the sensor data from the external recognition sensor 25 and the high-precision map stored in the map information storage unit 23. For example, the self-position estimating unit 71 generates a local map based on sensor data from the external recognition sensor 25, and estimates the self-position of the vehicle 1 by matching the local map with a high-precision map. The position of the vehicle 1 can be, for example, based on the center of the rear wheel set axle.
Examples of the local map include a three-dimensional high-precision map produced by using a technique such as SLAM (Simultaneous Localization and Mapping: simultaneous localization and mapping), and an occupied grid map (Occupancy Grid Map). Examples of the three-dimensional map include the above-described point cloud map. The occupied-grid map is a map in which a three-dimensional or two-dimensional space around the vehicle 1 is divided into grids (grids) of a predetermined size, and the occupied state of an object is shown in grid units. The occupancy state of an object is represented, for example, by the presence or absence of the object and the presence probability. The partial map is also used for detection processing and identification processing of the external situation of the vehicle 1 by the identification unit 73, for example.
The self-position estimating unit 71 may estimate the self-position of the vehicle 1 based on the position information acquired by the position information acquiring unit 24 and the sensor data from the vehicle sensor 27.
The sensor fusion unit 72 can perform a sensor fusion process of combining a plurality of different kinds of sensor data (for example, image data supplied from the camera 51 and sensor data supplied from the radar 52) to obtain new information. As a method for combining different kinds of sensor data, aggregation, fusion, union, and the like can be cited.
The identification unit 73 can perform detection processing for detecting the external situation of the vehicle 1 and identification processing for identifying the external situation of the vehicle 1.
For example, the identification unit 73 performs detection processing and identification processing of the external situation of the vehicle 1 based on information from the external identification sensor 25, information from the own position estimation unit 71, information from the sensor fusion unit 72, and the like.
Specifically, the recognition unit 73 performs, for example, detection processing, recognition processing, and the like of objects around the vehicle 1. The object detection process is, for example, a process of detecting the presence or absence of an object, the size, shape, position, operation, or the like. The object recognition processing is, for example, processing for recognizing an attribute such as the kind of an object or recognizing a specific object. However, the detection process and the identification process are not necessarily clearly distinguished, and may be repeated.
For example, the recognition unit 73 performs clustering in which point clouds based on sensor data such as radar 52 and LiDAR53 are classified by blocks of the point group, and detects objects around the vehicle 1. Thereby, the presence, size, shape, and position of the object around the vehicle 1 are detected.
For example, the recognition unit 73 detects movement of objects around the vehicle 1 by performing tracking of movement of blocks following the point groups classified by clustering. Thereby, the speed and the traveling direction (movement vector) of the surrounding objects of the vehicle 1 are detected.
For example, the recognition unit 73 detects or recognizes a vehicle, a person, a bicycle, an obstacle, a structure, a road, a signal lamp, a traffic sign, a road sign, or the like based on the image data supplied from the camera 51. The identification unit 73 may identify the type of the object around the vehicle 1 by performing identification processing such as semantic division.
For example, the recognition unit 73 can perform the recognition processing of the traffic rules around the vehicle 1 based on the map stored in the map information storage unit 23, the estimation result based on the own position of the own position estimation unit 71, and the recognition result based on the objects around the vehicle 1 by the recognition unit 73. By this processing, the identification unit 73 can identify the position and state of the traffic light, the content of the traffic sign and the road sign, the content of the traffic restriction, the lanes on which the vehicle can travel, and the like.
For example, the recognition unit 73 can perform a process of recognizing the surrounding environment of the vehicle 1. As the surrounding environment of the recognition object, the recognition unit 73 assumes weather, air temperature, humidity, brightness, the state of the road surface, and the like.
The action planning unit 62 creates an action plan of the vehicle 1. For example, the action planning unit 62 can create an action plan by performing a process of path planning and path following.
The route plan (Global path planning) is a process of planning a rough route from a start point to an end point. The route plan also includes a process of performing track generation (Local path planning), which is called track planning, and can safely and smoothly travel in the vicinity of the vehicle 1 in consideration of the movement characteristics of the vehicle 1 on the planned route.
Path following refers to a process of planning an action for safely and correctly traveling on a path planned by a path plan within a planned time. The action planning unit 62 can calculate the target speed and the target angular speed of the vehicle 1 based on the result of the processing of the path following, for example.
The operation control unit 63 can control the operation of the vehicle 1 in order to realize the action plan created by the action planning unit 62.
For example, the operation control unit 63 controls a steering control unit 81, a braking control unit 82, and a driving control unit 83 included in the vehicle control unit 32 described later, and performs acceleration/deceleration control and directional control so that the vehicle 1 travels on a track calculated by a track plan. For example, the operation control unit 63 performs coordinated control for achieving the function of ADAS such as collision avoidance, impact alleviation, following travel, vehicle speed maintenance travel, collision warning of the vehicle, and lane departure warning of the vehicle. For example, the operation control unit 63 performs coordinated control for the purpose of autonomous driving or the like in which traveling is performed autonomously regardless of the operation of the driver.
The DMS30 can perform a driver authentication process, a driver state recognition process, and the like based on sensor data from the in-vehicle sensor 26, input data to the HMI31 described later, and the like. As the state of the driver to be identified, for example, a physical condition, a degree of wakefulness, a degree of concentration, a degree of fatigue, a direction of line of sight, a degree of intoxication, a driving operation, a posture, and the like are assumed.
The DMS30 may perform authentication processing of passengers other than the driver and recognition processing of the states of the passengers. For example, the DMS30 may perform the process of recognizing the situation in the vehicle based on the sensor data from the in-vehicle sensor 26. As the condition in the vehicle to be identified, for example, air temperature, humidity, brightness, smell, and the like are assumed.
The HMI31 can input various data, instructions, and the like, and present various data to the driver, and the like.
The input of data based on the HMI31 will be schematically described. The HMI31 has an input device for human input data. The HMI31 generates an input signal based on data, instructions, and the like input through an input device, and supplies the input signal to each part of the vehicle control system 11. The HMI31 has, for example, an operation element such as a touch panel, a button, a switch, and a lever as an input device. The HMI31 is not limited to this, and may be provided with an input device capable of inputting information by a method other than manual operation, such as a voice or a gesture. The HMI31 may use, for example, a remote control device using infrared rays or radio waves, a mobile device corresponding to the operation of the vehicle control system 11, or an external connection device such as a wearable device as an input device.
The presentation of data based on the HMI31 will be schematically described. The HMI31 generates visual information, acoustic information, and tactile information for the passenger or the outside of the vehicle. The HMI31 performs output control such as output of each piece of information generated by control, output content, output timing, and an output method. The HMI31 generates and outputs, as visual information, information shown by an image or light, such as an operation screen, a status display of the vehicle 1, a warning display, a monitor image showing a situation around the vehicle 1, and the like. The HMI31 generates and outputs, for example, information shown by sound such as a sound guide, a warning sound, and a warning message as audible information. The HMI31 generates and outputs, for example, information of a sense of touch given to the occupant by force, vibration, movement, or the like, as the sense of touch information.
As an output device for outputting visual information by the HMI31, for example, a display device for presenting visual information by displaying an image by itself, or a projector device for presenting visual information by projecting an image can be applied. The display device may be, for example, a head-up display, a transmissive display, a wearable device having an AR (Augmented Reality: augmented reality) function, or the like, in addition to a display device having a normal display, and may be a device for displaying visual information in the field of view of the passenger. The HMI31 may use a display device provided in the navigation device, the instrument panel, CMS (Camera Monitoring System), the electronic mirror, the lamp, or the like of the vehicle 1 as an output device for outputting visual information.
As an output device for outputting the acoustic information, for example, an audio speaker, a headphone, and an earphone can be applied.
As an output device for outputting haptic information, for example, a haptic element using haptic technology can be applied. The haptic element is provided at a portion where a passenger of the vehicle 1 contacts, for example, a steering wheel, a seat, or the like.
The vehicle control unit 32 can control each unit of the vehicle 1. The vehicle control unit 32 includes a steering control unit 81, a brake control unit 82, a drive control unit 83, a vehicle body system control unit 84, a lamp control unit 85, and a horn control unit 86.
The steering control unit 81 can detect and control the state of the steering system of the vehicle 1. The steering system includes, for example, a steering mechanism including a steering wheel and the like, and electric power steering and the like. The steering control unit 81 includes, for example, a steering ECU that controls the steering system, an actuator that drives the steering system, and the like.
The brake control unit 82 can detect and control the state of the brake system of the vehicle 1. The brake system includes, for example, a brake mechanism including a brake pedal or the like, an ABS (Antilock Brake System: antilock brake system), a regenerative brake mechanism, and the like. The brake control unit 82 includes, for example, a brake ECU that controls a brake system, an actuator that drives the brake system, and the like.
The drive control unit 83 can detect and control the state of the drive system of the vehicle 1. The drive system includes, for example, an accelerator pedal, a drive force generating device for generating a drive force of an internal combustion engine, a drive motor, or the like, a drive force transmitting mechanism for transmitting the drive force to wheels, and the like. The drive control unit 83 includes, for example, a drive ECU that controls the drive system, an actuator that drives the drive system, and the like.
The vehicle body system control unit 84 can detect and control the state of the vehicle body system of the vehicle 1. The vehicle body system includes, for example, a keyless entry system, a smart key system, a power window device, a power seat, an air conditioner, an airbag, a seat belt, a shift lever, and the like. The vehicle body system control unit 84 includes, for example, a vehicle body system ECU that controls a vehicle body system, an actuator that drives the vehicle body system, and the like.
The lamp control unit 85 can detect and control the states of various lamps of the vehicle 1. As the lamp to be controlled, for example, a headlight, a rear lamp, a fog lamp, a turn signal lamp, a brake lamp, a projection lamp, a display of a bumper, and the like are assumed. The lamp control unit 85 includes a lamp ECU that controls the lamp, an actuator that drives the lamp, and the like.
The horn control section 86 can detect and control the state of the automobile horn of the vehicle 1. The horn control unit 86 includes, for example, a horn ECU that controls the horn of the automobile, an actuator that drives the horn of the automobile, and the like.
Fig. 19 is a diagram showing an example of the sensing areas of the camera 51, radar 52, liDAR53, ultrasonic sensor 54, and the like of the external recognition sensor 25 of fig. 18. In fig. 19, the vehicle 1 is schematically shown as seen from above, with the left end side being the front end (front) side of the vehicle 1 and the right end side being the rear end (rear) side of the vehicle 1.
The sensing region 101F and the sensing region 101B show examples of the sensing region of the ultrasonic sensor 54. The sensing region 101F covers the front end periphery of the vehicle 1 by the plurality of ultrasonic sensors 54. The sensing region 101B covers the rear end periphery of the vehicle 1 by the plurality of ultrasonic sensors 54.
The sensing results in the sensing region 101F and the sensing region 101B are used for, for example, parking assistance of the vehicle 1, or the like.
Sensing region 102F to sensing region 102B illustrate examples of sensing regions of radar 52 for short or medium range. The sensing region 102F is in front of the vehicle 1, covering a position farther than the sensing region 101F. The sensing region 102B is at the rear of the vehicle 1, covering a position farther than the sensing region 101B. The sensing region 102L covers the periphery rearward of the left side face of the vehicle 1. The sensing region 102R covers the periphery rearward of the right side face of the vehicle 1.
The sensing result in the sensing region 102F is used for detection of a vehicle, a pedestrian, or the like existing in front of the vehicle 1, for example. The sensing result in the sensing region 102B is used for, for example, a collision prevention function or the like in the rear of the vehicle 1. The sensing results in the sensing region 102L and the sensing region 102R are used, for example, for detection of an object in a dead angle on the side of the vehicle 1.
The sensing region 103F to the sensing region 103B show an example of the sensing region of the camera 51. The sensing region 103F is in front of the vehicle 1, covering to a position farther than the sensing region 102F. The sensing region 103B is at the rear of the vehicle 1, covering to a position farther than the sensing region 102B. The sensing region 103L covers the periphery of the left side face of the vehicle 1. The sensing region 103R covers the periphery of the right side face of the vehicle 1.
The sensing result in the sensing region 103F can be used for, for example, a signal lamp, recognition of traffic signs, a lane departure prevention support system, and an automatic headlamp control system. The sensing result in the sensing region 103B can be used for parking assistance and panoramic view systems, for example. The sensing results in the sensing region 103L and the sensing region 103R can be used for a panoramic view system, for example.
The sensing region 104 shows an example of a sensing region of LiDAR 53. The sensing region 104 is in front of the vehicle 1, covering to a position farther than the sensing region 103F. On the other hand, the sensing region 104 is narrowed in the left-right direction as compared with the sensing region 103F.
The sensing result in the sensing region 104 is used for object detection of a surrounding vehicle or the like, for example.
The sensing region 105 shows an example of a sensing region of the radar 52 for long distances. The sensing area 105 is in front of the vehicle 1, covering a position farther than the sensing area 104. On the other hand, the sensing region 105 is narrowed in the left-right direction as compared with the sensing region 104.
The sensing result in the sensing region 105 is used for, for example, ACC (Adaptive Cruise Control: adaptive cruise control), emergency braking, collision avoidance, and the like.
The sensing areas of the respective sensors included in the external recognition sensor 25, that is, the camera 51, the radar 52, the LiDAR53, and the ultrasonic sensor 54, may have various configurations other than those shown in fig. 19. Specifically, the ultrasonic sensor 54 may sense the side of the vehicle 1, or the LiDAR53 may sense the rear of the vehicle 1. The installation position of each sensor is not limited to the above-described examples. The number of each sensor may be one or a plurality of sensors.
The technology of the present disclosure can be applied to, for example, the camera 51, the LiDAR53, and the like. For example, by applying the technology of the present disclosure to the sensor fusion portion 72 that processes data from the cameras 51 and LiDAR53 of the vehicle control system 11, the internal parameters or the external parameters of the cameras 51 and LiDAR53 can be calibrated.
Supplement 6
While the preferred embodiments of the present disclosure have been described in detail with reference to the drawings, the technical scope of the present disclosure is not limited to such examples. It is obvious that various modifications and modifications can be made by those having ordinary knowledge in the art of the present disclosure within the scope of the technical ideas described in the claims, and it is to be understood that these modifications and modifications are of course also within the technical scope of the present disclosure.
The embodiments of the present disclosure described above can include, for example, a program (model) for causing a computer to function as the information processing apparatus of the present embodiment, and a non-transitory tangible medium on which the program (model) is recorded. In addition, in the embodiment of the present disclosure, the program (model) may be distributed via a communication line (including wireless communication) such as the internet.
The steps in the processing according to the embodiment of the present disclosure described above may not necessarily be performed in the order described. For example, the steps may be processed by changing the order appropriately. Instead of performing the processing in time series, the steps may be partially processed in parallel or independently. In the embodiment of the present disclosure, the processing method of each step may not necessarily be processed along the described method, and may be processed by another method by another functional unit, for example.
The effects described in the present specification are merely effects described or exemplified, and are not limited thereto. In other words, the technology of the present disclosure can achieve other effects that are clear to those skilled in the art from the description of the present specification, together with or instead of the above-described effects.
In the embodiment of the present disclosure, for example, the configuration described as one device may be divided into a plurality of devices. Conversely, the configuration described above as a plurality of devices may be collectively configured as one device. It is needless to say that other structures than those described above may be added to the structure of each device. In addition, as long as the configuration and operation of the entire system are substantially the same, a part of the configuration of one device may be included in the configuration of another device. The above system refers to a collection of a plurality of components (devices, modules (components), etc.), whether or not all the components are located in the same housing. Therefore, a plurality of devices stored in separate housings and connected via a network and one device storing a plurality of modules in one housing can be grasped as a system.
The present technology can also adopt the following configuration.
(1) An information processing apparatus, wherein,
comprises a learner for acquiring common feature points and feature quantities in a plurality of images and generating a model for matching the common feature points,
one of the first image acquired from the first imaging unit and the second image acquired from the second imaging unit and the projection image acquired and projected from a different imaging unit than the one image are input to the learner as a pair of input data.
(2) The information processing apparatus according to the above (1), wherein,
the projection image is a first projection image obtained by projecting the first image or a second projection image obtained by projecting the second image.
(3) The information processing apparatus according to the above (2), wherein,
a plurality of the pair of input data are input to the learner.
(4) The information processing apparatus according to the above (3), wherein,
at least one of another pair of input data including the first image and the first projection image and another pair of input data including the second image and the second projection image is input to the learner.
(5) The information processing apparatus according to any one of the above (2) to (4), wherein,
the learner includes:
an encoder unit which performs dimension compression on the pair of input data;
a detector unit configured to detect the feature point from the pair of compressed input data; and
and a feature amount acquisition unit configured to acquire the feature amount from the compressed pair of input data.
(6) The information processing apparatus according to the above (5), wherein,
The learner compares the characteristic points outputted from the detector with the characteristic points of the teacher data,
the learner compares a plurality of the feature values from the pair of input data outputted from the feature value obtaining unit.
(7) The information processing apparatus according to the above (6), wherein,
the encoder section includes:
a first encoder unit for inputting the first image and the first projection image; and
and a second encoder unit for inputting the second image and the second projection image.
(8) The information processing apparatus according to the above (6) or (7), wherein,
further comprises a teacher data generation unit for generating the teacher data,
the teacher data generating unit acquires likelihood maps of the feature points from the first and second images and the first and second projection images, and combines the likelihood maps.
(9) The information processing apparatus according to the above (8), wherein,
the teacher data generation unit performs machine learning using CG images in advance.
(10) The information processing apparatus according to any one of the above (1) to (9), wherein,
The image processing unit generates an image to be input to the learner by cutting the first wide area image acquired from the first imaging unit and the second wide area image acquired from the second imaging unit into images from the same viewpoint.
(11) The information processing apparatus according to the above (10), wherein,
the mask unit generates a mask of noise in the wide area images based on the first wide area image and the second wide area image whose alignment is adjusted.
(12) The information processing apparatus according to any one of the above (1) to (11), wherein,
the image processing apparatus further includes a feature point extraction unit that obtains feature points and feature amounts in the plurality of images using the model, and performs matching of the feature points in common.
(13) The information processing apparatus according to the above (12), wherein,
the feature point extracting unit acquires feature points and feature amounts in the first image and the second image newly acquired from the different imaging units, and performs matching of the feature points in common.
(14) The information processing apparatus according to the above (12), wherein,
the feature point extraction unit acquires feature points and feature amounts in the newly acquired plurality of first images or the plurality of second images, and performs matching of the feature points in common.
(15) The information processing apparatus according to any one of the above (12) to (14), wherein,
the image processing apparatus further includes a calibration unit that performs calibration of parameters related to the first imaging unit and the second imaging unit based on a positional relationship between the first imaging unit that acquires the first image and the second imaging unit that acquires the second image,
the calibration unit performs calibration using the position information of the feature points that have been matched.
(16) The information processing apparatus according to any one of the above (1) to (15), wherein,
the first photographing part is formed by a LiDAR or ToF sensor,
the second imaging unit is constituted by an image sensor.
(17) An information processing system, wherein,
comprises a first shooting part, a second shooting part and an information processing device,
the information processing apparatus includes a learner that acquires a feature point and a feature quantity common to a plurality of images and generates a model for matching the common feature point,
One of the first image acquired from the first imaging unit and the second image acquired from the second imaging unit and the projection image acquired and projected from a different imaging unit than the one image are input to the learner as a pair of input data.
(18) A model for causing a computer to function to acquire a feature point and a feature quantity common to a plurality of images and to match the feature point in common, wherein,
the information processing apparatus obtains the model by performing machine learning using, as a pair of input data, one of the first image acquired from the first imaging unit and the second image acquired from the second imaging unit, and a projection image acquired and projected from a different imaging unit than the one image.
(19) A model generation method for causing a computer to function to acquire a feature point and a feature quantity common to a plurality of images and to generate a model for matching the common feature point,
the information processing apparatus generates the model by performing machine learning using, as a pair of input data, one of the first image acquired from the first imaging unit and the second image acquired from the second imaging unit, and a projection image acquired and projected from a different imaging unit than the one image.
Reference numerals illustrate: a vehicle 1, a 10 information processing system, a vehicle control system 11, a vehicle control 21, a communication unit 22, a map information storage unit 23, a position information acquisition unit 24, an external recognition sensor 25, an in-vehicle sensor 26, a vehicle sensor 27, a storage unit 28, a travel assist/autopilot control unit 29, a Driver Monitor System (DMS), a human-machine interface (HMI), a vehicle control unit 32, a communication network 41, a 51, a 200 camera, a 52 radar, a 53, a 100 LiDAR, a 54 ultrasonic sensor 61 analysis unit, a 62 action planning unit 63 action control unit, a 71 self-position estimation unit 72 sensor fusion unit, a 73 recognition unit 81 steering control unit 82 brake control unit 83 drive control unit 84 car body system control unit 85 lamp control unit 86 horn control unit 86, 300 information processing apparatus, 302, 322, 342, 362 reflected intensity image acquisition section, 304, 324, 344, 364 visible light image acquisition section, 306, 326 reflected intensity image processing section, 308, 328 visible light image processing section, 310 feature point acquisition section, 312 position information acquisition section, 314 calibration section, 330 mask generation section, 332 input data generation section, 346, 366 reflected intensity image projection section, 348, 368 visible light image projection section, 350 positive solution label generation section, 370 learning section, 372 encoder section, 374 detector section, 376 descriptor section, 400, 404, 406 reflected intensity image, 402 reflected intensity panoramic image, 410 projected reflected intensity image, 500, 504, 506 visible light image, 502 visible light panoramic image, 510 … projects a visible light image, 602, 604 … mask image, 700 … CG image, 704, 710a, 710b, 712a, 712b … input data, 800, 900, 904 … positive labels, 802 … likelihood map, 810a, 810b, 812 … result data.