Movatterモバイル変換


[0]ホーム

URL:


US10726279B1 - Method and device for attention-driven resource allocation by using AVM and reinforcement learning to thereby achieve safety of autonomous driving - Google Patents

Method and device for attention-driven resource allocation by using AVM and reinforcement learning to thereby achieve safety of autonomous driving
Download PDF

Info

Publication number
US10726279B1
US10726279B1US16/739,767US202016739767AUS10726279B1US 10726279 B1US10726279 B1US 10726279B1US 202016739767 AUS202016739767 AUS 202016739767AUS 10726279 B1US10726279 B1US 10726279B1
Authority
US
United States
Prior art keywords
cnn
confidence
information
estimated
rpn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US16/739,767
Other versions
US20200250442A1 (en
Inventor
Kye-hyeon KIM
Yongjoong Kim
Hak-Kyoung Kim
Woonhyun Nam
Sukhoon Boo
Myungchul Sung
Dongsoo Shin
Donghun Yeo
Wooju RYU
Myeong-Chun Lee
Hyungsoo Lee
Taewoong Jang
Kyungjoong Jeong
Hongmo Je
Hojin Cho
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Stradvision Inc
Original Assignee
Stradvision Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Stradvision IncfiledCriticalStradvision Inc
Priority to US16/739,767priorityCriticalpatent/US10726279B1/en
Priority to KR1020200007629Aprioritypatent/KR102396272B1/en
Priority to EP20152682.9Aprioritypatent/EP3690719A1/en
Priority to CN202010073994.5Aprioritypatent/CN111507167B/en
Priority to JP2020008883Aprioritypatent/JP6895694B2/en
Assigned to StradVision, Inc.reassignmentStradVision, Inc.ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: BOO, SUKHOON, CHO, HOJIN, JANG, TAEWOONG, JE, HONGMO, JEONG, KYUNGJOONG, KIM, HAK-KYOUNG, KIM, Kye-hyeon, KIM, YONGJOONG, LEE, HYUNGSOO, LEE, MYEONG-CHUN, NAM, WOONHYUN, RYU, Wooju, SHIN, DONGSOO, SUNG, MYUNGCHUL, YEO, DONGHUN
Application grantedgrantedCritical
Publication of US10726279B1publicationCriticalpatent/US10726279B1/en
Publication of US20200250442A1publicationCriticalpatent/US20200250442A1/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Definitions

Landscapes

Abstract

A method for achieving better performance in an autonomous driving while saving computing powers, by using confidence scores representing a credibility of an object detection which is generated in parallel with an object detection process is provided. And the method includes steps of: (a) a computing device acquiring at least one circumstance image on surroundings of a subject vehicle, through at least one panorama view sensor installed on the subject vehicle; (b) the computing device instructing a Convolutional Neural Network(CNN) to apply at least one CNN operation to the circumstance image, to thereby generate initial object information and initial confidence information on the circumstance image; and (c) the computing device generating final object information on the circumstance image by referring to the initial object information and the initial confidence information, with a support of an RL agent.

Description

CROSS REFERENCE OF RELATED APPLICATION
This present application claims the benefit of the earlier filing date of provisional patent application No. 62/799,321, filed Jan. 31, 2019, the entire contents of which being incorporated herein by reference.
FIELD OF THE DISCLOSURE
The present disclosure relates to a method and a device for use with an autonomous vehicle; and more particularly, to the method and the device for a resource allocation by using a reinforcement learning to thereby achieve safety of autonomous driving.
BACKGROUND OF THE DISCLOSURE
Recently, an autonomous driving technology has been studied, so that an autonomous vehicle could be driven with a fairly high accuracy without an intervention of a driver. However, such autonomous driving technology has not been commercialized. There may be many reasons why the autonomous driving technology is not used by the public, but one of them is that operations for the autonomous driving require too much computing powers.
There has been many efforts to reduce a consumption of the computing powers required for the autonomous driving, but the problem caused by this kind of approach is that if the computing powers are reduced, performance of the autonomous driving is dropped. If the performance of the autonomous driving is dropped, it will cause a lot of errors, resulting in threatening a driver and people around the driver. Thus, the performance of the autonomous driving should be maintained while reducing the computing powers. But it has not been studied much.
SUMMARY OF THE DISCLOSURE
It is an object of the present disclosure to solve all the aforementioned problems.
It is an object of the present disclosure to provide a method for an attention-driven resource allocation by using an AVM and a reinforcement learning algorithm to thereby achieve a safety of an autonomous driving.
It is another object of the present disclosure to provide a method for generating confidence scores, which are parameters representing a credibility of an object detection, in parallel with a process of the object detection.
It is still another object of the present disclosure to provide a method for performing more accurate object detection by performing the reinforcement learning algorithm using the confidence scores, to thereby perform the object detection again on unconfident regions.
In accordance with one aspect of the present disclosure, there is provided a method for achieving better performance in an autonomous driving while saving computing powers, by using confidence scores representing a credibility of an object detection which is generated in parallel with an object detection process, including steps of: (a) a computing device acquiring at least one circumstance image on surroundings of a subject vehicle, through at least one panorama view sensor installed on the subject vehicle; (b) the computing device instructing a Convolutional Neural Network(CNN) to apply at least one CNN operation to the circumstance image, to thereby generate initial object information and initial confidence information on the circumstance image; and (c) the computing device generating final object information on the circumstance image by referring to the initial object information and the initial confidence information, with a support of a Reinforcement Learning(RL) agent.
As one example, the step of (c) includes steps of: (c1) the computing device (i) first instructing the RL agent to select each of one or more first specific regions, to which a re-detection process is to be applied, among one or more regions in the circumstance image by referring to the initial confidence information and (ii) then instructing, as the re-detection process, the CNN to apply the CNN operation to the first specific regions, to thereby generate first adjusted object information and first adjusted confidence information on the circumstance image; (c2) the computing device, iteratively, (i) instructing the RL agent to select each of one or more K-th specific regions, to which the re-detection process is to be applied, among the regions in the circumstance image by referring to (K−1)-th adjusted confidence information generated beforehand and (ii) then instructing, as the re-detection process, the CNN to apply the CNN operation to the K-th specific regions, to thereby generate K-th adjusted object information and K-th adjusted confidence information on the circumstance image; and (c3) the computing device, if the re-detection process is performed N times so that N-th adjusted object information and N-th adjusted confidence information are generated, generating the final object information by referring to the initial object information and at least part of the first adjusted object information to the N-th adjusted object information, and wherein K is an integer from 2 to N, and N is the number of the re-detection process to be performed by determination of the RL agent.
As one example, the RL agent generates M-th re-detection probability and one or more M-th specific regions by referring to (M−1)-th adjusted confidence information, and the computing device instructs the CNN to apply the CNN operation to the M-th specific regions when the M-th re-detection probability is larger than a threshold, wherein M is an integer from 1 to N, and wherein, if M is 1, the RL agent generates a first re-detection probability and the first specific regions by referring to the initial confidence information.
As one example, the RL agent, if the (M−1)-th adjusted confidence information is inputted thereto, (i) generates the M-th re-detection probability and the M-th specific regions by using its own parameters and the (M−1)_th adjusted confidence information, (ii) if the M-th re-detection probability is larger than the threshold, acquires M-th adjusted confidence information from the CNN, (iii) calculates an M-th original reward by referring to the (M−1)-th adjusted confidence information, the M-th adjusted confidence information and an M-th computational cost used for generating the M-th adjusted confidence information and its corresponding M-th adjusted object information, and (iv) trains said its own parameters by referring to the M-th original reward.
As one example, the RL agent trains said its own parameters by using a gradient generated by using a following formula:
dRdO=1ϵ+O:R(O>R(O)(R(O)-R(O))O:R(O)>R(O)(R(O)-R(O))ologp(OO)
wherein O is an M-th original vector including information on the M-th re-detection probability and the M-th specific regions, O′ is an M-th adjusted vector which has been generated by applying some noises to the M-th original vector, R(O) and R(O′) are respectively the M-th original reward and an M-th adjusted reward corresponding to the M-th adjusted vector respectively, and ∈ is a constant preventing a divergence of the gradient.
As one example, the computing device, if one or more L-th specific regions are selected while the re-detection process has been performed iteratively, instructs the RL agent to generate L-th magnification information corresponding to the L-th specific regions, generates L-th specific upsampled images by applying at least one upsampling operation to the L-th specific regions by referring to the L-th magnification information, and instructs the CNN to apply the CNN operation to the L-th specific upsampled images, to thereby generate L-th adjusted object information and L-th adjusted confidence information, wherein L is an integer from 1 to N.
As one example, at the step of (c), the computing device, while generating an integrated image including a plurality of circumstance images as its partial images, generates integrated final object information on the integrated image by using each of pieces of initial object information and each of pieces of initial confidence information on each of the circumstance images, wherein the integrated final object information is generated by performing following steps of: (c4) the computing device (i) generating first integrated confidence information on the integrated image by referring to said each of pieces of the initial confidence information, (ii) then instructing the RL agent to select each of one or more first specific regions, among one or more regions in the integrated image, to which a re-detection process is to be applied, by referring to the first integrated confidence information, and (iii) finally instructing, as the re-detection process, the CNN to apply the CNN operation to the first specific regions, to thereby generate each of pieces of first adjusted object information and each of pieces of first adjusted confidence information on the integrated image; (c5) the computing device iteratively (i) generating K-th adjusted integrated confidence information on the integrated image by referring to each of pieces of (K−1)-th adjusted confidence information, (ii) then instructing the RL agent to select each of one or more K-th specific regions, among the regions in the integrated image, to which the re-detection process is to be applied, by referring to the K-th adjusted integrated confidence information, and (iii) finally instructing, as the re-detection process, the CNN to apply the CNN operation to the K-th specific regions, to thereby generate each of pieces of K-th adjusted object information and each of pieces of K-th adjusted confidence information on the integrated image; and (c6) the computing device, if the re-detection process is performed N times so that each of pieces of N-th adjusted object information and each of pieces of N-th adjusted confidence information are generated, generating the integrated final object information by referring to the initial object information and at least part of the first adjusted object information to the N-th adjusted object information, and wherein K is an integer from 2 to N, and N is the number of the re-detection process to be performed by determination of the RL agent.
As one example, (i) on condition that one or more first circumstance images are acquired through one or more first panorama view sensors, among a plurality of panorama view sensors, corresponding to a camera, the computing device instructs a first CNN, which is optimized to a 3-channel image and whose number of input nodes is larger than or same as 3, to apply at least one first CNN operation to the first circumstance images, and (ii) on condition that one or more second circumstance images are acquired through one or more second panorama view sensors corresponding to a depth sensor, the computing device instructs a second CNN, which is optimized to a depth image and whose number of input nodes is larger than or same as 1, to apply at least one second CNN operation to the second circumstance images.
As one example, before the step of (a), further including steps of: (a1) a learning device, if a training image is acquired, instructing at least one convolutional layer included in the CNN to generate at least one convolutional feature map by applying at least one convolutional operation to the training image; (a2) the learning device, during a process of generating estimated Regions-Of-Interest(ROIs) on the training image by applying at least one anchor operation to the convolutional feature map, instructing at least one anchor layer included in a Region Proposal Network(RPN) to generate each of one or more RPN confidence scores for each of pixels in the convolutional feature map, representing each of one or more probabilities of the estimated ROIs being same as Ground-Truth(GT) ROIs, to thereby generate an RPN confidence map including the RPN confidence scores; (a3) the learning device, if at least one ROI-Pooled feature map is acquired, which has been generated by using the convolutional feature map and the estimated ROIs through an ROI pooling layer included in the CNN, during a process of generating an estimated object detection result by using the ROI-Pooled feature map, instructing an FC layer included in the CNN to generate each of CNN confidence scores for each of the estimated ROIs, representing each of one or more probabilities of each of one or more estimated CNN classification results and each of one or more estimated CNN regression results included in the estimated object detection result being same as each of one or more GT CNN classification results and each of one or more GT CNN regression results included in a GT object detection result, to thereby generate a CNN confidence map including the CNN confidence scores; and (a4) the learning device instructing a loss layer to generate at least one RPN loss and at least one CNN loss by referring to the RPN confidence map, the CNN confidence map, the estimated object detection result and the GT object detection result, and perform backpropagation by using the RPN loss and the CNN loss to learn at least part of parameters in the CNN and the RPN.
As one example, at the step of (a4), the learning device instructs the loss layer to generate the RPN loss by using a following formula:
LRPN({Pi},{ti})=1Ncls(i:pi*=I(pi>0.5)(1.1-ci)Lcls(pi,pi*)+i:pi*I(pi>0.5(0.1+ci)Lcls(pirpi*)+icilogci)+λ1Nregipi*Lreg(tirti*)
wherein Nregdenotes a constant corresponding to a size of the convolutional feature map generated by performing the anchor operation, Nclsdenotes a constant corresponding to the training image, cidenotes an i-th RPN confidence score corresponding to an i-th pixel of the convolutional feature map, among the RPN confidence scores, pidenotes an i-th estimated RPN classification result corresponding to the i-th pixel, pi* denotes an i-th GT RPN classification result corresponding to the i-th pixel, tidenotes an i-th estimated RPN regression result corresponding to the i-th pixel, and ti* denotes an i-th GT RPN regression result corresponding to the i-th pixel, and wherein, the i-th GT RPN classification result and the i-th GT RPN regression result correspond to the GT object detection result.
As one example, at the step of (a4), the learning device instructs the loss layer to generate the CNN loss by using a following formula:
LCNN({pi},{ti})=1Ncls(i:pi*=I(pi=largestscoreamongclassscores)(1.1-ci)Lcls(pi,pi*)+i:pi*I(pi=largestscoreamongclassscores)(0.1+ci)Lcls(pi,pi*)+icilogci)+λ1Nregipi*Lreg(ti,ti*)
wherein Nregand Nclsdenote the number of the estimated ROIs, cidenotes an i-th CNN confidence score, corresponding to an i-th estimated ROI selected from the estimated ROIs, among the CNN confidence scores, pidenotes an i-th estimated CNN classification result corresponding to the i-th estimated ROI, pi* denotes an i-th GT CNN classification result corresponding to the i-th pixel, tidenotes an i-th estimated CNN regression result corresponding to the i-th estimated ROI, and ti* denotes an i-th GT CNN regression result corresponding to the i-th pixel, and wherein the i-th estimated CNN classification result and the i-th estimated CNN regression result correspond to the estimated object detection result.
As one example, after the step of (a3), the learning device instructs a confidence layer to generate an integrated confidence map including information on each of integrated confidence scores for each of pixels in the training image by referring to the RPN confidence map and the CNN confidence map.
As one example, the learning device instructs the confidence layer to perform (i-1) a process of acquiring a Non-Maximum Suppression(NMS) result on the estimated ROIs generated during a process of generating the estimated object detection result, from the CNN, (i-2) a process of generating a resized RPN confidence map by applying at least one resizing operation to the RPN confidence map, and (ii) a process of generating the integrated confidence map by referring to the NMS result and the resized RPN confidence map.
As one example, the learning device instructs the confidence layer to generate an (X_Y)-th integrated confidence score, corresponding to a coordinate (x,y) on the training image, among the integrated confidence scores, by using a following formula:
cxy=max(cxy,maxri:(x,y)ricri)
wherein cxy′ denotes the (X_Y)-th integrated confidence score, cxydenotes an (X_Y)-th resized RPN confidence score corresponding to a coordinate (x,y) on the resized RPN confidence map, and cridenotes an i-th CNN confidence score for an i-th estimated ROI, denoted as ri, including the coordinate (x,y), which has been determined in the NMS result.
As one example, the step of (b) includes steps of: (b1) the computing device, if the circumstance image is acquired, instructing at least one convolutional layer included in the CNN to generate at least one convolutional feature map by applying at least one convolutional operation to the circumstance image; (b2) the computing device, during a process of generating estimated Regions-Of-Interest(ROIs) on the circumstance image by applying at least one anchor operation to the convolutional feature map, instructing at least one anchor layer included in a Region Proposal Network(RPN) to generate each of one or more RPN confidence scores for each of pixels in the convolutional feature map, representing each of one or more probabilities of the estimated ROIs being same as Ground-Truth(GT) ROIs, to thereby generate an RPN confidence map including the RPN confidence scores; (b3) the computing device, if at least one ROI-Pooled feature map is acquired, which has been generated by using the convolutional feature map and the estimated ROIs through an ROI pooling layer included in the CNN, during a process of generating an estimated object detection result by using the ROI-Pooled feature map, instructing an FC layer included in the CNN to generate each of CNN confidence scores for each of the estimated ROIs, representing each of one or more probabilities of each of one or more estimated CNN classification results and each of one or more estimated CNN regression results included in the estimated object detection result being same as each of one or more GT CNN classification results and each of one or more GT CNN regression results included in a GT object detection result, to thereby generate a CNN confidence map including the CNN confidence scores; and (b4) the computing device instructing a confidence layer interworking with the CNN to generate an integrated confidence map by referring to the RPN confidence map and the CNN confidence map, wherein the computing device instructs the CNN to output the initial object information including the estimated object detection result and the initial confidence information including the integrated confidence map.
In accordance with another aspect of the present disclosure, there is provided a method for computing device for achieving better performance in an autonomous driving while saving computing powers, by using confidence scores representing a credibility of an object detection which is generated in parallel with an object detection process, including: at least one memory that stores instructions; and at least one processor configured to execute the instructions to perform processes of: (I) acquiring at least one circumstance image on surroundings of a subject vehicle, through at least one panorama view sensor installed on the subject vehicle; (II) instructing a Convolutional Neural Network(CNN) to apply at least one CNN operation to the circumstance image, to thereby generate initial object information and initial confidence information on the circumstance image; and (III) generating final object information on the circumstance image by referring to the initial object information and the initial confidence information, with a support of a Reinforcement Learning(RL) agent.
As one example, the process of (III) further includes processes of: (III-1) (i) first instructing the RL agent to select each of one or more first specific regions, to which a re-detection process is to be applied, among one or more regions in the circumstance image by referring to the initial confidence information and (ii) then instructing, as the re-detection process, the CNN to apply the CNN operation to the first specific regions, to thereby generate first adjusted object information and first adjusted confidence information on the circumstance image; (III-2) iteratively, (i) instructing the RL agent to select each of one or more K-th specific regions, to which the re-detection process is to be applied, among the regions in the circumstance image by referring to (K−1)-th adjusted confidence information generated beforehand and (ii) then instructing, as the re-detection process, the CNN to apply the CNN operation to the K-th specific regions, to thereby generate K-th adjusted object information and K-th adjusted confidence information on the circumstance image; and (III-3) if the re-detection process is performed N times so that N-th adjusted object information and N-th adjusted confidence information are generated, generating the final object information by referring to the initial object information and at least part of the first adjusted object information to the N-th adjusted object information, and wherein K is an integer from 2 to N, and N is the number of the re-detection process to be performed by determination of the RL agent.
As one example, the RL agent generates M-th re-detection probability and one or more M-th specific regions by referring to (M−1)-th adjusted confidence information, and the processor instructs the CNN to apply the CNN operation to the M-th specific regions when the M-th re-detection probability is larger than a threshold, wherein M is an integer from 1 to N, and wherein, if M is 1, the RL agent generates a first re-detection probability and the first specific regions by referring to the initial confidence information.
As one example, the RL agent, if the (M−1)-th adjusted confidence information is inputted thereto, (i) generates the M-th re-detection probability and the M-th specific regions by using its own parameters and the (M−1)_th adjusted confidence information, (ii) if the M-th re-detection probability is larger than the threshold, acquires M-th adjusted confidence information from the CNN, (iii) calculates an M-th original reward by referring to the (M−1)-th adjusted confidence information, the M-th adjusted confidence information and an M-th computational cost used for generating the M-th adjusted confidence information and its corresponding M-th adjusted object information, and (iv) trains said its own parameters by referring to the M-th original reward.
As one example, the RL agent trains said its own parameters by using a gradient generated by using a following formula:
dRdO=1ϵ+O:R(O)>R(O)(R(O)-R(O))O:R(O)>R(O)(R(O)-R(O))ologp(O|O)
wherein O is an M-th original vector including information on the M-th re-detection probability and the M-th specific regions, O′ is an M-th adjusted vector which has been generated by applying some noises to the M-th original vector, R(O) and R(O′) are respectively the M-th original reward and an M-th adjusted reward corresponding to the M-th adjusted vector, and E is a constant preventing a divergence of the gradient.
As one example, the processor, if one or more L-th specific regions are selected while the re-detection process has been performed iteratively, instructs the RL agent to generate L-th magnification information corresponding to the L-th specific regions, generates L-th specific upsampled images by applying at least one upsampling operation to the L-th specific regions by referring to the L-th magnification information, and instructs the CNN to apply the CNN operation to the L-th specific upsampled images, to thereby generate L-th adjusted object information and L-th adjusted confidence information, wherein L is an integer from 1 to N.
As one example, at the process of (III), the processor, while generating an integrated image including a plurality of circumstance images as its partial images, generates integrated final object information on the integrated image by using each of pieces of initial object information and each of pieces of initial confidence information on each of the circumstance images, wherein the integrated final object information is generated by performing following processes of: (III-4) (i) generating first integrated confidence information on the integrated image by referring to said each of pieces of the initial confidence information, (ii) then instructing the RL agent to select each of one or more first specific regions, among one or more regions in the integrated image, to which a re-detection process is to be applied, by referring to the first integrated confidence information, and (iii) finally instructing, as the re-detection process, the CNN to apply the CNN operation to the first specific regions, to thereby generate each of pieces of first adjusted object information and each of pieces of first adjusted confidence information on the integrated image; (III-5) iteratively (i) generating K-th adjusted integrated confidence information on the integrated image by referring to each of pieces of (K−1)-th adjusted confidence information, (ii) then instructing the RL agent to select each of one or more K-th specific regions, among the regions in the integrated image, to which the re-detection process is to be applied, by referring to the K-th adjusted integrated confidence information, and (iii) finally instructing, as the re-detection process, the CNN to apply the CNN operation to the K-th specific regions, to thereby generate each of pieces of K-th adjusted object information and each of pieces of K-th adjusted confidence information on the integrated image; and (III-6) if the re-detection process is performed N times so that each of pieces of N-th adjusted object information and each of pieces of N-th adjusted confidence information are generated, generating the integrated final object information by referring to the initial object information and at least part of the first adjusted object information to the N-th adjusted object information, and wherein K is an integer from 2 to N, and N is the number of the re-detection process to be performed by determination of the RL agent.
As one example, (i) on condition that one or more first circumstance images are acquired through one or more first panorama view sensors, among a plurality of panorama view sensors, corresponding to a camera, the processor instructs a first CNN, which is optimized to a 3-channel image and whose number of input nodes is larger than or same as 3, to apply at least one first CNN operation to the first circumstance images, and (ii) on condition that one or more second circumstance images are acquired through one or more second panorama view sensors corresponding to a depth sensor, the processor instructs a second CNN, which is optimized to a depth image and whose number of input nodes is larger than or same as 1, to apply at least one second CNN operation to the second circumstance images.
As one example, before the process of (I), wherein the CNN is trained by performing processes of: (I-1) a learning device, if a training image is acquired, instructing at least one convolutional layer included in the CNN to generate at least one convolutional feature map by applying at least one convolutional operation to the training image; (I-2) the learning device, during a process of generating estimated Regions-Of-Interest(ROIs) on the training image by applying at least one anchor operation to the convolutional feature map, instructing at least one anchor layer included in a Region Proposal Network(RPN) to generate each of one or more RPN confidence scores for each of pixels in the convolutional feature map, representing each of one or more probabilities of the estimated ROIs being same as Ground-Truth(GT) ROIs, to thereby generate an RPN confidence map including the RPN confidence scores; (I-3) the learning device, if at least one ROI-Pooled feature map is acquired, which has been generated by using the convolutional feature map and the estimated ROIs through an ROI pooling layer included in the CNN, during a process of generating an estimated object detection result by using the ROI-Pooled feature map, instructing an FC layer included in the CNN to generate each of CNN confidence scores for each of the estimated ROIs, representing each of one or more probabilities of each of one or more estimated CNN classification results and each of one or more estimated CNN regression results included in the estimated object detection result being same as each of one or more GT CNN classification results and each of one or more GT CNN regression results included in a GT object detection result, to thereby generate a CNN confidence map including the CNN confidence scores; and (I-4) the learning device instructing a loss layer to generate at least one RPN loss and at least one CNN loss by referring to the RPN confidence map, the CNN confidence map, the estimated object detection result and the GT object detection result, and perform backpropagation by using the RPN loss and the CNN loss to learn at least part of parameters in the CNN and the RPN.
As one example, at the process of (I-4), the learning device instructs the loss layer to generate the RPN loss by using a following formula:
LRPN({pi},{ti})=1Ncls(i:pi*=I(pi>0.5)(1.1-ci)Lcls(pi,pi*)+i:pi*I(pi>0.5)(0.1+ci)Lcls(pi,pi*)+icilogci)+λ1Nregipi*Lreg(ti,ti*)
wherein Nregdenotes a constant corresponding to a size of the convolutional feature map generated by performing the anchor operation, Nclsdenotes a constant corresponding to the training image, cidenotes an i-th RPN confidence score corresponding to an i-th pixel of the convolutional feature map, among the RPN confidence scores, pidenotes an i-th estimated RPN classification result corresponding to the i-th pixel, pi* denotes an i-th GT RPN classification result corresponding to the i-th pixel, tidenotes an i-th estimated RPN regression result corresponding to the i-th pixel, and ti* denotes an i-th GT RPN regression result corresponding to the i-th pixel, and wherein, the i-th GT RPN classification result and the i-th GT RPN regression result correspond to the GT object detection result.
As one example, at the process of (I-4), the learning device instructs the loss layer to generate the CNN loss by using a following formula:
LCNN({pi},{ti})=1Ncls(i:pi*=I(pi=largestscoreamongclassscores)(1.1-ci)Lcls(pi,pi*)+i:pi*I(pi=largestscoreamongclassscores)(0.1+ci)Lcls(pi,pi*)+icilogci)+λ1Nregipi*Lreg(ti,ti*)
wherein Nregand Nclsdenote the number of the estimated ROIs, cidenotes an i-th CNN confidence score, corresponding to an i-th estimated ROI selected from the estimated ROIs, among the CNN confidence scores, pidenotes an i-th estimated CNN classification result corresponding to the i-th estimated ROI, pi* denotes an i-th GT CNN classification result corresponding to the i-th pixel, tidenotes an i-th estimated CNN regression result corresponding to the i-th estimated ROI, and ti* denotes an i-th GT CNN regression result corresponding to the i-th pixel, and wherein the i-th estimated CNN classification result and the i-th estimated CNN regression result correspond to the estimated object detection result.
As one example, after the process of (I-3), the learning device instructs a confidence layer to generate an integrated confidence map including information on each of integrated confidence scores for each of pixels in the training image by referring to the RPN confidence map and the CNN confidence map.
As one example, the learning device instructs the confidence layer to perform (i-1) a process of acquiring a Non-Maximum Suppression(NMS) result on the estimated ROIs generated during a process of generating the estimated object detection result, from the CNN, (i-2) a process of generating a resized RPN confidence map by applying at least one resizing operation to the RPN confidence map, and (ii) a process of generating the integrated confidence map by referring to the NMS result and the resized RPN confidence map.
As one example, the learning device instructs the confidence layer to generate an (X_Y)-th integrated confidence score, corresponding to a coordinate (x,y) on the training image, among the integrated confidence scores, by using a following formula:
cxy=max(cxy,maxri:(x,y)ricri)
wherein cxy′ denotes the (X_Y)-th integrated confidence score, cxydenotes an (X_Y)-th resized RPN confidence score corresponding to a coordinate (x,y) on the resized RPN confidence map, and cridenotes an i-th CNN confidence score for an i-th estimated ROI, denoted as ri, including the coordinate (x,y), which has been determined in the NMS result.
As one example, the process of (II) includes processes of: (II-1) if the circumstance image is acquired, instructing at least one convolutional layer included in the CNN to generate at least one convolutional feature map by applying at least one convolutional operation to the circumstance image; (II-2) during a process of generating estimated Regions-Of-Interest(ROIs) on the circumstance image by applying at least one anchor operation to the convolutional feature map, instructing at least one anchor layer included in a Region Proposal Network(RPN) to generate each of one or more RPN confidence scores for each of pixels in the convolutional feature map, representing each of one or more probabilities of the estimated ROIs being same as Ground-Truth(GT) ROIs, to thereby generate an RPN confidence map including the RPN confidence scores; (II-3) if at least one ROI-Pooled feature map is acquired, which has been generated by using the convolutional feature map and the estimated ROIs through an ROI pooling layer included in the CNN, during a process of generating an estimated object detection result by using the ROI-Pooled feature map, instructing an FC layer included in the CNN to generate each of CNN confidence scores for each of the estimated ROIs, representing each of one or more probabilities of each of one or more estimated CNN classification results and each of one or more estimated CNN regression results included in the estimated object detection result being same as each of one or more GT CNN classification results and each of one or more GT CNN regression results included in a GT object detection result, to thereby generate a CNN confidence map including the CNN confidence scores; and (II-4) instructing a confidence layer interworking with the CNN to generate an integrated confidence map by referring to the RPN confidence map and the CNN confidence map, wherein the computing device instructs the CNN to output the initial object information including the estimated object detection result and the initial confidence information including the integrated confidence map.
In addition, recordable media that are readable by a computer for storing a computer program to execute the method of the present disclosure is further provided.
BRIEF DESCRIPTION OF THE DRAWINGS
The above and other objects and features of the present disclosure will become apparent from the following description of preferred embodiments given in conjunction with the accompanying drawings.
The following drawings to be used to explain example embodiments of the present disclosure are only part of example embodiments of the present disclosure and other drawings can be obtained based on the drawings by those skilled in the art of the present disclosure without inventive work.
FIG. 1 is a drawing schematically illustrating a configuration of a computing device performing a method for achieving better performance in an autonomous driving while saving computing powers by performing a Reinforcement Learning(RL) algorithm using confidence scores representing a credibility of an object detection which is generated in parallel with an object detection process in accordance with one example embodiment of the present disclosure.
FIG. 2 is a drawing schematically illustrating a flow of the method for achieving the better performance in the autonomous driving while saving the computing powers by performing the RL algorithm using the confidence scores representing the credibility of the object detection which is generated in parallel with the object detection process in accordance with one example embodiment of the present disclosure.
FIG. 3 is a drawing schematically illustrating an example of an integrated image and its corresponding integrated final object information to be used for performing the method for achieving the better performance in the autonomous driving while saving the computing powers by performing the RL algorithm using the confidence scores representing the credibility of the object detection which is generated in parallel with the object detection process in accordance with one example embodiment of the present disclosure.
FIG. 4 is a drawing schematically illustrating a flow of a training process of a Convolutional Neural Network(CNN) to be used for performing the method for achieving the better performance in the autonomous driving while saving the computing powers by performing the RL algorithm using the confidence scores representing the credibility of the object detection which is generated in parallel with the object detection process in accordance with one example embodiment of the present disclosure.
FIG. 5 is a drawing schematically illustrating a learning device including a Convolutional Neural Network(CNN), a Region Proposal Network(RPN), a loss layer and a confidence layer to be used for performing the learning method for switching the modes of the autonomous vehicle based on the on-device standalone prediction to thereby achieve the safety of the autonomous driving in accordance with one example embodiment of the present disclosure.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Detailed explanation on the present disclosure to be made below refer to attached drawings and diagrams illustrated as specific embodiment examples under which the present disclosure may be implemented to make clear of purposes, technical solutions, and advantages of the present disclosure. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure.
Besides, in the detailed description and claims of the present disclosure, a term “include” and its variations are not intended to exclude other technical features, additions, components or steps. Other objects, benefits and features of the present disclosure will be revealed to one skilled in the art, partially from the specification and partially from the implementation of the present disclosure. The following examples and drawings will be provided as examples but they are not intended to limit the present disclosure.
Moreover, the present disclosure covers all possible combinations of example embodiments indicated in this specification. It is to be understood that the various embodiments of the present disclosure, although different, are not necessarily mutually exclusive. For example, a particular feature, structure, or characteristic described herein in connection with one embodiment may be implemented within other embodiments without departing from the spirit and scope of the present disclosure. In addition, it is to be understood that the position or arrangement of individual elements within each disclosed embodiment may be modified without departing from the spirit and scope of the present disclosure. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present disclosure is defined only by the appended claims, appropriately interpreted, along with the full range of equivalents to which the claims are entitled. In the drawings, like numerals refer to the same or similar functionality throughout the several views.
Any images referred to in the present disclosure may include images related to any roads paved or unpaved, in which case the objects on the roads or near the roads may include vehicles, persons, animals, plants, buildings, flying objects like planes or drones, or any other obstacles which may appear in a road-related scene, but the scope of the present disclosure is not limited thereto. As another example, said any images referred to in the present disclosure may include images not related to any roads, such as images related to alleyway, land lots, sea, lakes, rivers, mountains, forests, deserts, sky, or any indoor space, in which case the objects in said any images may include vehicles, persons, animals, plants, buildings, flying objects like planes or drones, ships, amphibious planes or ships, or any other obstacles which may appear in a scene related to alleyway, land lots, sea, lakes, rivers, mountains, forests, deserts, sky, or any indoor space, but the scope of the present disclosure is not limited thereto.
To allow those skilled in the art to carry out the present disclosure easily, the example embodiments of the present disclosure by referring to attached diagrams will be explained in detail as shown below.
FIG. 1 is a drawing schematically illustrating a configuration of a computing device performing a method for achieving better performance in an autonomous driving while saving computing powers by performing a Reinforcement Learning(RL) algorithm using confidence scores representing a credibility of an object detection which is generated in parallel with an object detection process in accordance with one example embodiment of the present disclosure.
By referring toFIG. 1, thecomputing device100 may include a Convolutional Neural Network(CNN)130, a Region Proposal Network(RPN)140, aconfidence layer150 and anRL agent160 to be described later. Processes of input/output and computations of theCNN130, theRPN140, theconfidence layer150 and theRL agent160 may be respectively performed by at least onecommunication part110 and at least oneprocessor120. However, detailed communication schematics between thecommunication part110 and theprocessor120 are omitted inFIG. 1. Herein, amemory115 may have stored various instructions to be described later, and theprocessor120 may execute the instructions stored in thememory115 and may perform processes of the present disclosure by executing the instructions to be disclosed later. Such description of thecomputing device100 does not exclude an integrated device including any combination of a processor, a memory, a medium, or any other computing components.
Thecomputing device100 may interwork with a subject vehicle, and may acquire images photographed by at least one panorama sensor installed on the subject vehicle. Such panorama view sensor may be built as a camera or a depth sensor, i.e., a radar or a Lidar.
So far the configuration of thecomputing device100 to be used for performing the method for achieving the better performance in the autonomous driving while saving the computing powers by performing the RL algorithm using the confidence scores representing the credibility of the object detection which is generated in parallel with the object detection process in accordance with one example embodiment of the present disclosure has been explained. Below, the method itself will be explained by referring toFIG. 2.
FIG. 2 is a drawing schematically illustrating a flow of the method for achieving the better performance in the autonomous driving while saving the computing powers by performing the RL algorithm using the confidence scores representing the credibility of the object detection which is generated in parallel with the object detection process in accordance with one example embodiment of the present disclosure.
By referring toFIG. 2, at a step of S100, thecomputing device100 may acquire at least one circumstance image on surroundings of the subject vehicle, through the panorama view sensor installed on the subject vehicle. Thereafter, at a step of S200, thecomputing device100 may instruct theCNN130 to apply at least one CNN operation to the circumstance image, to thereby generate initial object information and initial confidence information on the circumstance image. And, at a step of S300, the computing device may generate final object information on the circumstance image by referring to the initial object information and the initial confidence information with a support of theRL agent160.
Brief steps of the method are shown above. Below, it will be explained more specifically. First, an example embodiment corresponding to a singular circumstance image will be explained.
As the circumstance image is acquired through the panorama view sensor, thecomputing device100 may instruct theCNN130 to apply the CNN operation to the circumstance image. Herein, if the circumstance image is acquired through a first panorama view sensor which has been built as the camera, thecomputing device100 may instruct a first CNN, which is optimized to a 3-channel image and whose number of input nodes is larger than or same as 3, to apply at least one first CNN operation to the circumstance image, to thereby apply the CNN operation thereto. Otherwise, if the circumstance image is acquired through a second panorama view sensor which has been built as the depth sensor, thecomputing device100 may instruct a second CNN, which is optimized to a depth image and whose number of input nodes is larger than or same as 1, to apply at least one second CNN operation to the circumstance image, to thereby apply the CNN operation thereto. Herein, notations of the “first CNN” and the “second CNN” were used for distinguishing CNNs which can function as theCNN130 in specific cases as shown above. Similarly, notations of the “first panorama view sensor” and the “second panorama view sensor” were used for distinguishing panorama view sensors which can function as the panorama view sensor in the specific cases as shown above.
After the initial object information and the initial confidence information are acquired through the CNN operation, thecomputing device100 may perform a re-detection process iteratively to generate the final object information, to be explained later. Herein, thecomputing device100 may be supported by theRL agent160. Below, a configuration of theRL agent160 and its own processes will be explained below.
First, theRL agent160 may be a kind of the Neural Network, including multiple layers, and including multiple virtual neurons in each of the layers. Herein, each of the neurons may perform a process of applying an operation, e.g., a convolutional operation, to its inputted value by using its own parameter, and then delivering its outputted value to its next convolutional neuron.Such RL agent160 may acquire, as its input, at least one of the initial confidence information and first adjusted confidence information to N-th adjusted confidence information to be explained later, and may apply some operations to its input by using said neurons, to thereby generate information representing a necessity of the re-detection process on its input and regions in its input where the re-detection process should be applied.
Specifically, assume that theRL agent160 acquired (M−1)-th adjusted confidence information. Herein, M may be an integer from 1 to N, and if M is 1, it may be assumed that the initial confidence information is acquired by theRL agent160. Then, theRL agent160 may apply the operations by using said neurons, using their own parameters, to thereby generate an M-th re-detection probability representing whether the re-detection process should be performed for its corresponding iteration or not, and M-th specific regions where the re-detection process should be applied. Herein, thecomputing device100 may instruct theCNN130 to apply the re-detection process to the M-th specific regions when the M-th re-detection probability is larger than a threshold. Accordingly, N may denote the number of re-detection probabilities generated by theRL agent160 being larger than the threshold.
Since theRL agent160 performs the RL algorithm, it can train its own parameters whenever it generates its outputs by processing its inputted confidence information. That is, theRL agent160, if the M-th re-detection probability is larger than the threshold, may acquire M-th adjusted confidence information from theCNN130, and may calculate an M-th original reward by by referring to (i) the (M−1)-th adjusted confidence information, (ii) the M-th adjusted confidence information and (iii) an M-th computational cost used for generating the M-th adjusted confidence information and its corresponding M-th adjusted object information generated by theCNN130, to be explained later. Then, theRL agent160 may train its own parameters by referring to the M-th original reward.
Herein, the M-th original reward may correspond to a value calculated by subtracting the M-th computational cost from an increase in confidence scores owing to the re-detection process. To be explained soon, theRL agent160 trains said parameters to make its rewards larger, thus theRL agent160 may be trained to make the confidence scores increase through the re-detection process without too much computational resources required for increasing the confidence scores. Such M-th original reward can be calculated by using a following formula:
R(O)=1WHx,yW,Hmax(0,I(O)xy-Ixy)-c×1WHSM
In the formula, W and H may denote a width and a height of the circumstance image respectively, and I(O)xymay denote one of M-th confidence scores, on a coordinate (x,y) in the circumstance image, included in the M-th adjusted confidence information. Also, Ixymay denote one of the (M−1)-th confidence scores, on the coordinate (x,y) therein, included in the (M−1)-th adjusted confidence information. And, SMmay denote a sum of areas of the M-th specific regions, and c may be a prescribed constant. Herein, by performing a sigma operation at the front, said increase of the confidence scores may be calculated, and by performing a sigma operation at the back, said M-th computational cost may be calculated, since a size of an area where the re-detection process is applied is proportional to its usage of computational resources.
After the M-th original reward is calculated, theRL agent160 may adjust the M-th re-detection probability and the M-th specific regions a little bit, and may calculate an M-th adjusted reward corresponding thereto. It is for theRL agent160 to perform an exploration, thus if the M-th adjusted reward is larger than the M-th original reward, the parameters may be trained correspondingly to the M-th adjusted reward.
Specifically, if a four channel coordinate determining one of the M-th specific regions is (x1,y1,x2,y2), each of values corresponding to a probability distribution, e.g., a normal distribution, may be added to each of components thereof, to thereby adjust one of the M-th specific regions as (x1+N(0,1), y1+N(0,1), x2+N(0,1), y2+N(0,1)). Also, if the M-th re-detection probability is p, by using a binomial operation, the M-th re-detection probability may be adjusted as Binomial(p). Thereafter, thecomputing device100 may instruct theCNN130 to perform the re-detection process by referring to the adjusted results, and may instruct theRL agent160 to calculate the M-th adjusted reward by using outputs of theCNN130. Then, theRL agent160 may train its parameters by using a gradient generated by performing an operation corresponding to a following formula:
dRdO=1ϵ+O:R(O)>R(O)(R(O)-R(O))O:R(O)>R(O)(R(O)-R(O))ologp(OO)p(OO)=N(x1|x1,1)N(y1|y1,1)Binomial(pp)
Herein, O may denote an M-th original vector including information on the M-th re-detection probability and the M-th specific regions, O′ may denote an M-th adjusted vector which has been generated by applying some noises to the M-th original vector. Also, R(O) and R(O′) may respectively denote the M-th original reward and an M-th adjusted reward corresponding to the M-th adjusted vector, and E may denote a constant preventing a divergence of the gradient.
Based onsuch RL agent160, thecomputing device100 may first instruct theRL agent160 to select each of one or more first specific regions, among one or more regions in the circumstance image, whose corresponding confidence score is smaller than a first threshold, to which the re-detection process is applied, by referring to the initial confidence information, and then instruct theCNN130 to apply the CNN operation to the first specific regions, to thereby generate first adjusted object information and first adjusted confidence information on the circumstance image. To be simple, regions where theCNN130 is not confident about its result of an object detection, i.e., the first specific regions, are detected by theRL agent160, and an object detection is applied again to such regions. As said before, theRL agent160 may determine whether it is appropriate to apply the object detection again with regard to the usage of the computational resources. The initial confidence information, which supports thecomputing device100 to perform such process, is generated by theCNN130, while it performs the object detection. How theCNN130 can be trained to perform as shown above will be explained later.
Thereafter, thecomputing device100 may perform the re-detection process iteratively. That is, thecomputing device100 may instruct theRL agent160 to select each of the K-th specific regions, among the regions in the circumstance image, whose corresponding confidence score is smaller than the first threshold, to which the re-detection process is to be applied, by referring to (K−1)-th adjusted confidence information generated beforehand, and then instruct, as the re-detection process, theCNN130 to apply the CNN operation to the K-th specific regions, to thereby generate K-th adjusted object information and K-th adjusted confidence information on the circumstance image. The K-th adjusted object information may include information on one or more objects in the K-th specific regions, and such information may not have been detected in a (K-P)-th re-detection process performed beforehand but may have been detected this time. Herein, P may be an integer from 1 to K−1. As said before, the re-detection process is applied to the regions where theCNN130 is not confident, thus such relationship between the pieces of the information may be natural.
In order to perform the re-detection process more efficiently to detect objects in the specific regions whose corresponding confidence scores are small, a selective process, i.e. an upsampling, may be further performed. That is, if one or more L-th specific regions are selected while an L-th re-detection process has been performed, thecomputing device100 may instruct theRL agent160 to generate L-th magnification information corresponding to the L-th specific regions, and then may generate L-th specific upsampled images by applying at least one upsampling operation to the L-th specific regions by referring to the L-th magnification information, and may instruct theCNN130 to apply the CNN operation to the L-th specific upsampled images, to thereby generate L-th adjusted object information and L-th adjusted confidence information. The L-th magnification information may denote a magnification ratio which has been prescribed for the L-th re-detection process.
In order to build theRL agent160 to output the magnification information along with information on the re-detection probabilities and the specific regions, some final output nodes may be added to its final layer. Herein, the L-th magnification information may be determined by selecting one among multiple candidate magnification ratios whose probabilities have been outputted from said final nodes similarly to a classification. For example, the candidate magnification ratios may be 2, 3 and 4, and as each of probabilities for each of the candidate magnification ratios is outputted from the final output nodes, the largest one may be found, and its corresponding candidate magnification ratio may be used for performing the upsampling operation. In this case, a term for a conditional probability in the above formula may be slightly changed to reflect the magnification information thereon, since new final output nodes have been added.
p(O′|O)=ΠN(x1′|x1,1) . . .N(y2′|y2,1)Binomial(p′|p)Multinomial(dm′,tm′,qm′|dm,tm,qm)
In the formula, each of dm,tm,qm may denote each of said probabilities of each of the candidate magnification ratios being appropriate for the re-detection process, and each of dm′,tm′,qm′ may denote each of adjusted probabilities corresponding thereto. Adjusting processes of the probabilities may be similar to those of the re-detection probabilities and the specific regions. In this way, theRL agent160 may become capable of generating the L-th magnification information.
Also, any upsampling schematics known before, such as a Nearest Neighbor algorithm, can be used for performing the upsampling operation.
If the re-detection process is performed N times so that N-th adjusted object information and N-th adjusted confidence information are generated, thecomputing device100 may generate the final object information by referring to the initial object information and at least part of the first adjusted object information to the N-th adjusted object information.
However, there must be a case when a sufficient result of the object detection is derived even though the re-detection process has not been performed for prescribed times, i.e., N times. For such a case, the iteration may be stopped, and the final object information may be generated by using object information generated until that time. That is, if a sum of one or more areas of one or more M-th specific regions, which have been selected when an M-th re-detection process has been performed, is smaller than the first threshold, thecomputing device100 may stop iteratively performing the re-detection process, and may generate the final object information by referring to the initial object information and at least part of the first adjusted object information to (M−1)-th adjusted object information.
In order to generate the final object information, thecomputing device100 may perform at least one of a Non-Maximum Suppression algorithm and a Box Voting algorithm by referring to the initial object information and at least part of the first adjusted object information and the N-th adjusted object information. Such algorithms are well-known to a person in the art, thus above explanation will be enough to be understood.
After the final object information is generated as shown above, thecomputing device100 may deliver the final object information to an autonomous driving module corresponding to the subject vehicle, to thereby support the autonomous driving module to perform an autonomous driving of the subject vehicle by using the final object information.
So far the example embodiment corresponding to the singular circumstance image has been explained. Below, another example embodiment corresponding to (i) multiple circumstance images to be used for generating an integrated image which is panoramic, and (ii) integrated final object information corresponding thereto will be explained. Said another example embodiment is similar to the example embodiment corresponding to the singular one, thus difference between the two will be explained mainly.
Thecomputing device100 may generate the integrated final object information on the integrated image by using each of pieces of initial object information and each of pieces of initial confidence information on each of the circumstance images while generating the integrated image including a plurality of the circumstance images as its partial images.
Specifically, (i) on condition that one or more first circumstance images among the circumstance images are acquired through one or more first panorama view sensors, among a plurality of panorama view sensors, corresponding to the camera, thecomputing device100 may instruct the first CNN, which is optimized to the 3-channel image and whose number of the input nodes is larger than or same as 3, to apply the first CNN operation to the first circumstance images, and (ii) on condition that one or more second circumstance images among the circumstance images are acquired through one or more second panorama view sensors, among the panorama view sensors, corresponding to the depth sensor, thecomputing device100 may instruct the second CNN, which is optimized to the depth image and whose number of the input nodes is larger than or same as 1, to apply the second CNN operation to the second circumstance images. Then, each of pieces of initial confidence information and each of initial object information on each of the circumstance images can be acquired.
Thereafter, as an initial iteration of the re-detection process, thecomputing device100 may (i) generate first integrated confidence information on the integrated image by referring to said each of pieces of the initial confidence information, (ii) then may instruct theRL agent160 to select each of one or more first specific regions, among one or more regions in the integrated image, whose corresponding confidence score is smaller than the first threshold, to which the re-detection process is to be applied, by referring to the first integrated confidence information, and (iii) may instruct theCNN130 to apply the CNN operation to the first specific regions, to thereby generate each of pieces of first adjusted object information and each of pieces of first adjusted confidence information on the integrated image.
Then, thecomputing device100 may perform the re-detection process iteratively. That is, thecomputing device100 may generate K-th adjusted integrated confidence information on the integrated image by referring to each of pieces of (K−1)-th adjusted confidence information, (ii) may then instruct theRL agent160 to select each of one or more K-th specific regions, among the regions in the integrated image, whose corresponding confidence score is smaller than the first threshold, to which the re-detection process is to be applied, by referring to the K-th adjusted integrated confidence information, and (iii) may instruct theCNN130 to apply the CNN operation to the K-th specific regions, to thereby generate each of pieces of K-th adjusted object information and each of pieces of K-th adjusted confidence information on the integrated image.
Finally, if the re-detection process is performed N times, i.e., the number of theRL agent160 having generated the re-detection probabilities larger than the threshold, so that each of pieces of N-th adjusted object information and each of pieces of N-th adjusted confidence information are generated, thecomputing device100 may generate the integrated final object information by referring to the initial object information and at least part of the first adjusted object information to the N-th adjusted object information. As shown above, said another example embodiment is almost same as the original example embodiment except that each of the confidence information and each of the object information are integrated in each of iterations of the re-detection process.
In order to take a look at an example of the integrated image and the integrated object information,FIG. 3 will be referred to.
FIG. 3 is a drawing schematically illustrating an example of an integrated image and its corresponding integrated final object information to be used for performing the method for achieving the better performance in the autonomous driving while saving the computing powers by performing the RL algorithm using the confidence scores representing the credibility of the object detection which is generated in parallel with the object detection process in accordance with one example embodiment of the present disclosure.
By referring toFIG. 3, it can be seen that theintegrated image300 is the panorama image which has been acquired by photographing the surroundings of thesubject vehicle200. Such panorama image is hard to acquire by using only one sensor, thus it is usually acquired by integrating multiple images photographed through multiple sensors. In this case, information on the objects in theintegrated image300, i.e., information on afirst vehicle object310 and information on asecond vehicle object320 can be acquired as the integrated final object information, by using the method of the present disclosure.
So far two example embodiments have been explained. Below, how theCNN130, to be used for performing the two example embodiments, can be trained will be explained by referring toFIG. 4.
FIG. 4 is a drawing schematically illustrating a flow of a training process of the CNN to be used for performing the method for achieving the better performance in the autonomous driving while saving the computing powers by performing the RL algorithm using the confidence scores representing the credibility of the object detection which is generated in parallel with the object detection process in accordance with one example embodiment of the present disclosure.
By referring toFIG. 4, a learning method for generating each of confidence scores, which are parameters representing degrees of the credibility of the object detection during a process of the object detection, can be seen briefly.
For reference, in the description below, the phrase “for training” or “training” is added for terms related to the learning process, and the phrase “for testing” or “testing” is added for terms related to the testing process, to avoid possible confusion.
First, at a step of S10, if a training image is acquired, a learning device may instruct a convolutional layer included in theCNN130 to apply at least one convolutional operation to the training image, to generate at least one convolutional feature map. Herein, in case a geometric size of the training image may be H×W, and a format of the training image is an RGB format so that a channel size of the training image is 3, a data size of the training image may be H×W×3. Since the convolutional operation allows a geometric size of its outputted feature map to be smaller than that of its inputted image, and a channel size of its outputted feature map larger than that of its inputted image, h and w, denoting the geometric size of the convolutional feature map, may be smaller than H and W respectively, and c, denoting the channel size thereof, may be larger than 3, if the size of the convolutional feature map is h×w×c.
Thereafter, at a step of S20, the learning device may instruct an anchor layer in theRPN140 to output estimated Regions-Of-Interest(ROIs), which are regions expected to include objects in the training image, by applying an anchor operation to values included in the convolutional feature map. Specifically, theRPN140 may generate an anchor feature map whose data size is h×w×(5A+1), to be used for generating the estimated ROIs on the training image, by applying the anchor operation to the convolutional feature map whose data size is h×w×c. By according to a prior art, the data size of the anchor feature map should have been h×w×5A, but, the anchor feature map in accordance with the present disclosure may include one more channel to include RPN confidence scores to be explained later. A more specific explanation on this will be presented below.
The learning device may instruct the anchor layer of theRPN140 to generate the anchor feature map whose data size is h×w×5A by applying the anchor operation to the convolutional feature map, where the anchor operation uses sliding-windows each of which corresponds to each set of anchors. Herein, the anchors may be a sort of grids for selecting values of the convolutional feature map corresponding to the objects, and the above-mentioned A, used for representing the data size of the anchor feature map, may denote the number of the anchors used by theRPN140. By performing this process, whether each of values included in the convolutional feature map corresponds to the objects or not may be determined, and a result of this process, e.g., information on estimated ROIs, may be stored in the anchor feature map. Since the anchor operation is a well-known prior art, more specific explanation will be omitted.
During the process of generating the anchor feature map, the learning device may instruct theRPN140 to generate each of the RPN confidence scores for each of pixels included in the convolutional feature map, representing each of one or more probabilities of the estimated ROIs being same as Ground-Truth(GT) ROIs. To be simple, the RPN confidence scores are parameters indicating degrees of the credibility of the results, e.g., the estimated ROIs, of the process performed by theRPN140.
Above, the process of determining the estimated ROIs and the process of generating the RPN confidence scores have been explained separately for convenience, but the two processes may be performed at the same time. That is, as the convolutional feature map is inputted to theRPN140, each of RPN neurons in theRPN140 may perform its operations and deliver its operated value to its next RPN neuron, to thereby output the anchor feature map from a final layer of theRPN140. Thus, the two processes may be performed, dependently affecting each other. However, those two processes may be performed not at the same time. For example, the process of determining the estimated ROIs may be performed first.
After the anchor feature map is generated, the learning device may deliver values of 5A channels therein, including information on the estimated ROIs, to theCNN130, and deliver values of remaining one channel therein to theconfidence layer150 to be explained later. First, how the values of the 5A channels delivered to theCNN130 are processed will be explained below.
After said values of the 5A channels are delivered, the learning device may instruct an ROI pooling layer in theCNN130 to apply at least one ROI pooling operation to the convolutional feature map in order to generate an ROI-pooled feature map by referring to the information on the estimated ROIs, and, at a step of S30, may instruct an FC layer included in theCNN130 to apply at least one FC operation to the ROI-pooled feature map, to thereby generate an initial object detection result and CNN confidence scores.
Herein, the initial object detection result may include each of initial estimated CNN regression results on each of estimated coordinates of each of bounding boxes including each of the objects, and each of initial estimated CNN classification results on each of class scores of each of the objects, representing each of probabilities of each of the objects being included in each of classes. Such process of generating the initial object detection result is a well-known prior art.
And, the CNN confidence scores may include information on degrees of an estimated object detection result being same as a GT object detection result, to be explained later. Herein, the estimated object detection result may have been generated by applying Non-Maximum Suppression(NMS) operation to the initial object detection result. Specifically, each of the CNN confidence scores may represent each of one or more probabilities of each of the one or more estimated CNN classification results and each of the one or more estimated CNN regression results included in the estimated object detection result being same as each of one or more GT CNN classification results and each of one or more GT CNN regression results included in the GT object detection result. To be simple, the CNN confidence scores may represent degrees of credibility on results generated by the FC layer of theCNN130.
The reason why the CNN confidence scores and the RPN confidence scores are both used is that estimated ROIs initially determined by theRPN140 are processed by theCNN130 later, in order to generate the estimated object detection result. Since the two networks both involve in the processes of generating the estimated object detection result, thus the two networks should be evaluated differently.
For example, even in case a specific estimated ROI is wrongly determined by theRPN140, the FC layer of theCNN130 may be able to filter the specific estimated ROI which has been wrongly determined. Or, even in case the specific estimated ROI is determined properly by theRPN140, the FC layer of theCNN130 may wrongly determine that the specific estimated ROI does not include any objects. As can be seen in such cases, theRPN140 and theCNN130 may perform wrongly or properly, thus such cases should be considered during evaluating the object detection process.
Similar toRPN140, (i) the process of generating the initial object detection result and its corresponding estimated object detection result and (ii) the process of generating the CNN confidence map have been explained separately for convenience, but those two processes may be performed by the FC layer at the same time, dependently to each other. However, those two processes may be performed independently.
After such CNN confidence scores are generated for the estimated ROIs, a CNN confidence map including the CNN confidence scores may be generated.
Thereafter, the learning device may instruct theCNN130 to integrate bounding boxes corresponding to overlapped estimated ROIs by applying the NMS to the initial object detection result, to thereby generate the estimated object detection result. As the overlapped ones are integrated, specific CNN confidence scores corresponding to specific estimated ROIs, which have not been overlapped to other estimated ROIs, may be selected. Herein, such usage of the NMS is a well-known prior art, thus more specific explanation is omitted.
Meanwhile, the learning device may instruct theconfidence layer150 to acquire values of the one channel in the anchor feature map, which correspond to the RPN confidence map, and generate a resized RPN confidence map whose geometric size is H×W, same as the training image, by using the RPN confidence map. In order to generate the resized RPN confidence map, any of resizing operations, e.g., Nearest Neighbor Size, Bilinear Resize, Bicubic Resize or Lanczos Resize, may be applied to the RPN confidence map, to pair each of RPN confidence scores with each of pixels in the training image.
After the estimated object detection result and the resized RPN confidence map are acquired, the learning device may instruct theconfidence layer150 to generate an integrated confidence map by referring thereto. Herein, integrated confidence scores included in the integrated confidence map may be calculated by using a following formula:
cxy=max(cxy,maxri:(x,y)ricri)
Herein, cxy′ may denote an (X_Y)-th integrated confidence score and cxymay denote an (X_Y)-th resized RPN confidence score corresponding to a coordinate (x,y) on the resized RPN confidence map. Also, crimay denote an i-th CNN confidence score for an i-th estimated ROI, denoted as ri, including the coordinate (x,y), which has been determined in the NMS result. The learning device may instruct theconfidence layer150 to generate the integrated confidence map by using the integrated confidence scores generated as shown above.
In order to generate the integrated confidence map properly, theCNN130 and theRPN140 should be trained. Below how the two networks can be trained will be presented.
That is, at a step of S40, the learning device may instruct the loss layer to generate at least one RPN loss and at least one CNN loss by referring to the RPN confidence map, the CNN confidence map, the estimated object detection result and the GT object detection result, and perform backpropagation by using the RPN loss and the CNN loss to learn at least part of parameters in theCNN130 and theRPN140. Below, how to generate the RPN loss and the CNN loss will be explained.
First, the RPN loss can be generated by using a following formula:
LRPN({pi},{ti})=1Ncls(i:pi*=I(pi>0.5)(1.1-ci)Lcls(pi,pi*)+i:pi*I(pi>0.5)(0.1+ci)Lcls(pi,pi*)+icilogci)+λ1Nregipi*Lreg(ti,ti*)
Herein, Nregmay denote a constant corresponding to a size of the convolutional feature map generated by performing the anchor operation, and Nclsmay denote a constant corresponding to the training image. Also, cimay denote an i-th RPN confidence score corresponding to an i-th pixel of the convolutional feature map, among the RPN confidence scores, pimay denote an i-th estimated RPN classification result corresponding to the i-th pixel, and pi* may denote an i-th GT RPN classification result corresponding to the i-th pixel, And, timay denote an i-th estimated RPN regression result corresponding to the i-th pixel, and ti* may denote an i-th GT RPN regression result corresponding to the i-th pixel. The i-th GT RPN classification result and the i-th GT RPN regression result may correspond to the GT object detection result. Also, Lclsand Lregmay be implemented by using any of prior arts for generating losses, e.g., smooth-L1 loss.
In the formula, a first term of the formula may denote a classification loss, and a second one may denote a regression loss. The regression loss term, i.e., the second term, is a well-known prior art, thus further explanation is omitted. Below, the classification loss term, i.e., the first term will be explained.
The classification loss term may include three sub-terms in a parenthesis as shown above. In the first sub-term thereof, a condition i: pi*=I(pi>0.5) represents a case that the i-th estimated RPN classification result, determined by theRPN140, corresponding to the i-th pixel of the anchor feature map, is same as the i-th GT RPN classification result. To be simple, it represents a case that theRPN140 has analyzed the i-th pixel of the convolutional feature map correctly. Herein, it is assumed that theRPN140 has analyzed the i-th pixel “correctly”, if theRPN140 has determined the i-th estimated RPN classification result to be same as the i-th GT RPN classification result with a probability larger than a threshold probability, herein 0.5 for convenience. In the first sub-term, (1.1-ci) allows the i-th RPN confidence score to be larger when theRPN140 has analyzed the i-th pixel correctly, since parameters of theRPN140 are adjusted to make the RPN loss smaller. Specifically, the classification loss term including the first sub-term generated as shown above is referred to, in order to generate gradients for allowing the parameters of theRPN140 to be adjusted, to thereby allow a future RPN loss to be smaller. Thus, by using the classification loss term including the first sub-term, theRPN140 may generate larger RPN confidence scores in case theRPN140 analyzes the values of the convolutional feature map correctly.
In the second sub-term of the classification loss term, a condition i:pi*≠I(pi>0.5) represents a case that theRPN140 has analyzed the i-th pixel incorrectly. In the second sub-term, (0.1+ci) allows the i-th RPN confidence score to be smaller when theRPN140 has analyzed the i-th pixel incorrectly. A mechanism of such training processes may be same as that of the first sub-term.
In the third sub-term, cilog ciallows a distribution of the RPN confidence scores not to be similar to that of a step function. If the classification loss term only includes the first sub-term and the second sub-term, the RPN confidence scores may be overfitted so that those may include only extreme values close to 0 or 1. Herein, cilog ciis the smallest when ciis 0.5. Thus, the distribution of the RPN confidence scores in accordance with the present disclosure may be more diverse, including various values between 0 and 1. Since various situations can be dealt with by using the RPN confidence scores including the various values as shown above, this term is necessary.
Since theRPN140 loss in accordance with the present disclosure have been explained above, theCNN130 loss will be presented below.
LCNN({pi},{ti})=1Ncls(i:pi*=I(pi=largestscoreamongclassscores)(1.1-ci)Lcls(pi,pi*)+i:pi*I(pi=largestscoreamongclassscores)(0.1+ci)Lcls(pi,pi*)+icilogci)+λ1Nregipi*Lreg(ti,ti*)
Herein Nregand Nclsmay denote the number of the estimated ROIs, and cimay denote an i-th CNN confidence score, corresponding to an i-th estimated ROI among the estimated ROIs, among the CNN confidence scores. Also, pimay denote an i-th estimated CNN classification result corresponding to the i-th estimated ROI, pi* may denote an i-th GT CNN classification result corresponding to the i-th pixel. And, timay denote an i-th estimated CNN regression result corresponding to the i-th estimated ROI, and ti* may denote an i-th GT CNN regression result corresponding to the i-th pixel. The i-th estimated CNN classification result and the i-th estimated CNN regression result may correspond to the estimated object detection result. In this formula, notations may be same as that of the RPN loss, but it may be different as shown above.
As can be seen in the formula above, the CNN loss may be similar to the RPN loss, except their conditions of sigma functions. Herein, the conditions on piinclude pi=largest score among class scores, not pi>0.5 as theRPN140 loss. Such conditions may represent that piis related to the i-th estimated CNN classification result on a specific class whose probability of including a specific object in the i-th estimated ROI is the largest. Similar to the RPN loss, a first sub-term of a classification loss term in the CNN loss may represent a case that theCNN130 has analyzed the i-th estimated ROI correctly, and the second sub-term may represent another case that theCNN130 has analyzed the i-th estimated ROI incorrectly. The CNN loss and the RPN loss are similar to each other, thus further explanation is omitted.
After the CNN loss and the RPN loss are generated as shown above, the learning device may instruct the loss layer to learn at least part of parameters included in theCNN130 and theRPN140 by performing backpropagation using the CNN loss and the RPN loss respectively. Thereafter, while theCNN130 and theRPN140 perform processes of detecting objects included in an image to be inputted, theCNN130 and theRPN140 may more accurately generate CNN confidence scores and RPN confidence scores corresponding to said image to be inputted.
After the training process is done, the learning device may deliver theCNN130 to thecomputing device100. TheRPN140 and theconfidence layer150 are subordinate to theCNN130, thus those will also be delivered to thecomputing device100. TheCNN130, theRPN140 and theconfidence layer150 may perform operations similar to those performed during the training process by using fully trained parameters. That is, as theCNN130 acquires the circumstance image, theCNN130, theRPN140 and theconfidence layer150 may generate the initial confidence information in a way same as the way of generating the integrated confidence map during the training process, along with the initial object information.
Herein, for reference, an additional explanation on how theCNN130, theRPN140, theloss layer150 and theconfidence layer160 interwork to generate the confidence scores while performing the object detection, will be given by referring toFIG. 5.
FIG. 5 is a drawing schematically illustrating the learning device including the CNN, the RPN, the loss layer and the confidence layer to be used for performing the learning method for switching the modes of the autonomous vehicle based on the on-device standalone prediction to thereby achieve the safety of the autonomous driving in accordance with one example embodiment of the present disclosure.
By referring toFIG. 5, it can be seen that the training image is inputted to theconvolutional layer131 in theCNN130, and the convolutional feature map with a size of h×w×c is outputted therefrom. Then, the convolutional feature map may be processed by theRPN140, and theRPN140 may output a feature map with a size of h×w×(5A+1), in which locations of the ROIs shown as (dx,dy,dw,dh) and their RPN confidence scores are included. The ROIs may be delivered to the ROI pooling layer (not illustrated) in theCNN130, thus the convolutional feature map may be pooled by referring to the ROIs accordingly. Thereafter, theFC layer132 in theCNN130 may process the ROI-pooled feature map with a size of R×d to generate the initial object detection result with a size of R×(5C+1), in which estimated locations of objects shown as (x1,y1,x2,y2), their probabilities of being included in each of classes, and CNN confidence scores for each of pixels corresponding to the objects are included. Finally, the CNN may apply the NMS to the initial object detection result to generate the estimated object detection result with a size of R′×6, in which the estimated locations of objects shown as (x1,y1,x2,y2), their estimated classes, and their probabilities of being included in said estimated classes are included. Herein R′ may denote integrated ROIs generated by the NMS. The CNN confidence map may be outputted along with the estimated object detection result, and integrated with the resized RPN confidence map generated from the RPN confidence map with a size of h×w×1 to generate the integrated confidence map with a size of H×W×1. Such process is in accord with the above-explained learning method.
By supporting the autonomous driving module to use additional information acquired by applying object detection again to parts where degrees of the credibility are low, a safer autonomous driving may be achieved.
The present disclosure has an effect of providing a method for an attention-driven resource allocation by using an AVM and a reinforcement learning algorithm to thereby achieve a safety of an autonomous driving.
The present disclosure has another effect of providing a method for generating confidence scores, which are parameters representing a credibility of an object detection, in parallel with a process of the object detection.
The present disclosure has still another effect of providing a method for performing more accurate object detection by performing the reinforcement learning algorithm using the confidence scores, to thereby perform the object detection again on unconfident regions.
The embodiments of the present disclosure as explained above can be implemented in a form of executable program command through a variety of computer means recordable to computer readable media. The computer readable media may include solely or in combination, program commands, data files, and data structures. The program commands recorded to the media may be components specially designed for the present disclosure or may be usable to a skilled human in a field of computer software. Computer readable media include magnetic media such as hard disk, floppy disk, and magnetic tape, optical media such as CD-ROM and DVD, magneto-optical media such as floptical disk and hardware devices such as ROM, RAM, and flash memory specially designed to store and carry out program commands. Program commands include not only a machine language code made by a complier but also a high level code that can be used by an interpreter etc., which is executed by a computer. The aforementioned hardware device can work as more than a software module to perform the action of the present disclosure and they can do the same in the opposite case.
As seen above, the present disclosure has been explained by specific matters such as detailed components, limited embodiments, and drawings. They have been provided only to help more general understanding of the present disclosure. It, however, will be understood by those skilled in the art that various changes and modification may be made from the description without departing from the spirit and scope of the disclosure as defined in the following claims.
Accordingly, the thought of the present disclosure must not be confined to the explained embodiments, and the following patent claims as well as everything including variations equal or equivalent to the patent claims pertain to the category of the thought of the present disclosure.

Claims (28)

What is claimed is:
1. A method for achieving better performance in an autonomous driving while saving computing powers, by using confidence scores representing a credibility of an object detection which is generated in parallel with an object detection process, comprising steps of:
(a) a computing device acquiring at least one circumstance image on surroundings of a subject vehicle, through at least one panorama view sensor installed on the subject vehicle;
(b) the computing device instructing a Convolutional Neural Network(CNN) to apply at least one CNN operation to the circumstance image, to thereby generate initial object information and initial confidence information on the circumstance image; and
(c) the computing device generating final object information on the circumstance image by referring to the initial object information and the initial confidence information, with a support of a Reinforcement Learning(RL) agent;
wherein the step of (c) includes steps of:
(c1) the computing device (i) first instructing the RL agent to select each of one or more first specific regions, to which a re-detection process is to be applied, among one or more regions in the circumstance image by referring to the initial confidence information and (ii) then instructing, as the re-detection process, the CNN to apply the CNN operation to the first specific regions, to thereby generate first adjusted object information and first adjusted confidence information on the circumstance image;
(c2) the computing device, iteratively, (i) instructing the RL agent to select each of one or more K-th specific regions, to which the re-detection process is to be applied, among the regions in the circumstance image by referring to (K−1)-th adjusted confidence information generated beforehand and (ii) then instructing, as the re-detection process, the CNN to apply the CNN operation to the K-th specific regions, to thereby generate K-th adjusted object information and K-th adjusted confidence information on the circumstance image; and
(c3) the computing device, if the re-detection process is performed N times so that N-th adjusted object information and N-th adjusted confidence information are generated, generating the final object information by referring to the initial object information and at least part of the first adjusted object information to the N-th adjusted object information, and
wherein K is an integer from 2 to N, and N is the number of the re-detection process to be performed by determination of the RL agent.
2. The method ofclaim 1, wherein the RL agent generates M-th re-detection probability and one or more M-th specific regions by referring to (M−1)-th adjusted confidence information, and the computing device instructs the CNN to apply the CNN operation to the M-th specific regions when the M-th re-detection probability is larger than a threshold, wherein M is an integer from 1 to N, and wherein, if M is 1, the RL agent generates a first re-detection probability and the first specific regions by referring to the initial confidence information.
3. The method ofclaim 2, wherein the RL agent, if the (M−1)-th adjusted confidence information is inputted thereto, (i) generates the M-th re-detection probability and the M-th specific regions by using its own parameters and the (M−1)_th adjusted confidence information, (ii) if the M-th re-detection probability is larger than the threshold, acquires M-th adjusted confidence information from the CNN, (iii) calculates an M-th original reward by referring to the (M−1)-th adjusted confidence information, the M-th adjusted confidence information and an M-th computational cost used for generating the M-th adjusted confidence information and its corresponding M-th adjusted object information, and (iv) trains said its own parameters by referring to the M-th original reward.
4. The method ofclaim 3, wherein the RL agent trains said its own parameters by using a gradient generated by using a following formula:
dRdO=1ϵ+O:R(O)>R(O)(R(O)-R(O))O:R(O)>R(O)(R(O)-R(O))ologp(OO)
wherein O is an M-th original vector including information on the M-th re-detection probability and the M-th specific regions, O′ is an M-th adjusted vector which has been generated by applying some noises to the M-th original vector, R(O) and R(O′) are respectively the M-th original reward and an M-th adjusted reward corresponding to the M-th adjusted vector, and E is a constant preventing a divergence of the gradient.
5. The method ofclaim 1, wherein the computing device, if one or more L-th specific regions are selected while the re-detection process has been performed iteratively, instructs the RL agent to generate L-th magnification information corresponding to the L-th specific regions, generates L-th specific upsampled images by applying at least one upsampling operation to the L-th specific regions by referring to the L-th magnification information, and instructs the CNN to apply the CNN operation to the L-th specific upsampled images, to thereby generate L-th adjusted object information and L-th adjusted confidence information, wherein L is an integer from 1 to N.
6. A method for achieving better performance in an autonomous driving while saving computing powers, by using confidence scores representing a credibility of an object detection which is generated in parallel with an object detection process, comprising steps of:
(a) a computing device acquiring at least one circumstance image on surroundings of a subject vehicle, through at least one panorama view sensor installed on the subject vehicle;
(b) the computing device instructing a Convolutional Neural Network(CNN) to apply at least one CNN operation to the circumstance image, to thereby generate initial object information and initial confidence information on the circumstance image; and
(c) the computing device generating final object information on the circumstance image by referring to the initial object information and the initial confidence information, with a support of a Reinforcement Learning(RL) agent;
wherein, at the step of (c), the computing device, while generating an integrated image including a plurality of circumstance images as its partial images, generates integrated final object information on the integrated image by using each of pieces of initial object information and each of pieces of initial confidence information on each of the circumstance images; and
wherein the integrated final object information is generated by performing following processes of:
(c4) the computing device (i) generating first integrated confidence information on the integrated image by referring to said each of pieces of the initial confidence information, (ii) then instructing the RL agent to select each of one or more first specific regions, among one or more regions in the integrated image, to which a re-detection process is to be applied, by referring to the first integrated confidence information, and (iii) finally instructing, as the re-detection process, the CNN to apply the CNN operation to the first specific regions, to thereby generate each of pieces of first adjusted object information and each of pieces of first adjusted confidence information on the integrated image;
(c5) the computing device iteratively (i) generating K-th adjusted integrated confidence information on the integrated image by referring to each of pieces of (K−1)-th adjusted confidence information, (ii) then instructing the RL agent to select each of one or more K-th specific regions, among the regions in the integrated image, to which the re-detection process is to be applied, by referring to the K-th adjusted integrated confidence information, and (iii) finally instructing, as the re-detection process, the CNN to apply the CNN operation to the K-th specific regions, to thereby generate each of pieces of K-th adjusted object information and each of pieces of K-th adjusted confidence information on the integrated image; and
(c6) the computing device, if the re-detection process is performed N times so that each of pieces of N-th adjusted object information and each of pieces of N-th adjusted confidence information are generated, generating the integrated final object information by referring to the initial object information and at least part of the first adjusted object information to the N-th adjusted object information, and
wherein K is an integer from 2 to N, and N is the number of the re-detection process to be performed by determination of the RL agent.
7. The method ofclaim 6, wherein (i) on condition that one or more first circumstance images are acquired through one or more first panorama view sensors, among a plurality of panorama view sensors, corresponding to a camera, the computing device instructs a first CNN, which is optimized to a 3-channel image and whose number of input nodes is larger than or same as 3, to apply at least one first CNN operation to the first circumstance images, and (ii) on condition that one or more second circumstance images are acquired through one or more second panorama view sensors corresponding to a depth sensor, the computing device instructs a second CNN, which is optimized to a depth image and whose number of input nodes is larger than or same as 1, to apply at least one second CNN operation to the second circumstance images.
8. A method for achieving better performance in an autonomous driving while saving computing powers, by using confidence scores representing a credibility of an object detection which is generated in parallel with an object detection process, comprising steps of:
(a) a learning device, if a training image is acquired, instructing at least one convolutional layer included in a Convolutional Neural Network(CNN) to generate at least one convolutional feature map by applying at least one convolutional operation to the training image;
(b) the learning device, during a process of generating estimated Regions-Of-Interest(ROIs) on the training image by applying at least one anchor operation to the convolutional feature map, instructing at least one anchor layer included in a Region Proposal Network(RPN) to generate each of one or more RPN confidence scores for each of pixels in the convolutional feature map, representing each of one or more probabilities of the estimated ROIs being same as Ground-Truth(GT) ROIs, to thereby generate an RPN confidence map including the RPN confidence scores;
(c) the learning device, if at least one ROI-Pooled feature map is acquired, which has been generated by using the convolutional feature map and the estimated ROIs through an ROI pooling layer included in the CNN, during a process of generating an estimated object detection result by using the ROI-Pooled feature map, instructing an FC layer included in the CNN to generate each of CNN confidence scores for each of the estimated ROIs, representing each of one or more probabilities of each of one or more estimated CNN classification results and each of one or more estimated CNN regression results included in the estimated object detection result being same as each of one or more GT CNN classification results and each of one or more GT CNN regression results included in a GT object detection result, to thereby generate a CNN confidence map including the CNN confidence scores;
(d) the learning device instructing a loss layer to generate at least one RPN loss and at least one CNN loss by referring to the RPN confidence map, the CNN confidence map, the estimated object detection result and the GT object detection result, and perform backpropagation by using the RPN loss and the CNN loss to learn at least part of parameters in the CNN and the RPN;
(e) a computing device acquiring at least one circumstance image on surroundings of a subject vehicle, through at least one panorama view sensor installed on the subject vehicle;
(f) the computing device instructing the CNN to apply at least one CNN operation to the circumstance image, to thereby generate initial object information and initial confidence information on the circumstance image; and
(g) the computing device generating final object information on the circumstance image by referring to the initial object information and the initial confidence information, with a support of a Reinforcement Learning(RL) agent.
9. The method ofclaim 8, wherein, at the step of (d), the learning device instructs the loss layer to generate the RPN loss by using a following formula:
LRPN({pi},{ti})=1Ncls(i:pi*=I(pi>0.5)(1.1-ci)Lcls(pi,pi*)+i:pi*I(pi>0.5)(0.1+ci)Lcls(pi,pi*)+icilogci)+λ1Nregipi*Lreg(ti,ti*)
wherein Nregdenotes a constant corresponding to a size of the convolutional feature map generated by performing the anchor operation, Nclsdenotes a constant corresponding to the training image, cidenotes an i-th RPN confidence score corresponding to an i-th pixel of the convolutional feature map, among the RPN confidence scores, pidenotes an i-th estimated RPN classification result corresponding to the i-th pixel, pi* denotes an i-th GT RPN classification result corresponding to the i-th pixel, tidenotes an i-th estimated RPN regression result corresponding to the i-th pixel, and ti* denotes an i-th GT RPN regression result corresponding to the i-th pixel, and
wherein, the i-th GT RPN classification result and the i-th GT RPN regression result correspond to the GT object detection result.
10. The method ofclaim 8, wherein, at the step of (d), the learning device instructs the loss layer to generate the CNN loss by using a following formula:
LCNN({pi},{ti})=1Ncls(i:pi*=I(pi=largestscoreamongclassscores)(1.1-ci)Lcls(pi,pi*)+i:pi*I(pi=largestscoreamongclassscores)(0.1+ci)Lcls(pi,pi*)+icilogci)+λ1Nregipi*Lreg(ti,ti*)
wherein Nregand Nclsdenote the number of the estimated ROIs, cidenotes an i-th CNN confidence score, corresponding to an i-th estimated ROI selected from the estimated ROIs, among the CNN confidence scores, pidenotes an i-th estimated CNN classification result corresponding to the i-th estimated ROI, pi* denotes an i-th GT CNN classification result corresponding to the i-th pixel, tidenotes an i-th estimated CNN regression result corresponding to the i-th estimated ROI, and ti* denotes an i-th GT CNN regression result corresponding to the i-th pixel, and
wherein the i-th estimated CNN classification result and the i-th estimated CNN regression result correspond to the estimated object detection result.
11. The method ofclaim 8, wherein, after the step of (c), the learning device instructs a confidence layer to generate an integrated confidence map including information on each of integrated confidence scores for each of pixels in the training image by referring to the RPN confidence map and the CNN confidence map.
12. The method ofclaim 11, wherein the learning device instructs the confidence layer to perform (i-1) a process of acquiring a Non-Maximum Suppression(NMS) result on the estimated ROIs generated during a process of generating the estimated object detection result, from the CNN, (i-2) a process of generating a resized RPN confidence map by applying at least one resizing operation to the RPN confidence map, and (ii) a process of generating the integrated confidence map by referring to the NMS result and the resized RPN confidence map.
13. The method ofclaim 12, wherein the learning device instructs the confidence layer to generate an (X_Y)-th integrated confidence score, corresponding to a coordinate (x,y) on the training image, among the integrated confidence scores, by using a following formula:
cxy=max(cxy,maxri:(x,y)ricri)
wherein cxy′ denotes the (X_Y)-th integrated confidence score, cxydenotes an (X_Y)-th resized RPN confidence score corresponding to a coordinate (x,y) on the resized RPN confidence map, and cridenotes an i-th CNN confidence score for an i-th estimated ROI, denoted as ri, including the coordinate (x,y), which has been determined in the NMS result.
14. A method for achieving better performance in an autonomous driving while saving computing powers, by using confidence scores representing a credibility of an object detection which is generated in parallel with an object detection process, comprising steps of:
(a) a computing device acquiring at least one circumstance image on surroundings of a subject vehicle, through at least one panorama view sensor installed on the subject vehicle;
(b) the computing device instructing a Convolutional Neural Network(CNN) to apply at least one CNN operation to the circumstance image, to thereby generate initial object information and initial confidence information on the circumstance image; and
(c) the computing device generating final object information on the circumstance image by referring to the initial object information and the initial confidence information, with a support of a Reinforcement Learning(RL) agent;
wherein the step of (b) includes steps of:
(b1) the computing device, if the circumstance image is acquired, instructing at least one convolutional layer included in the CNN to generate at least one convolutional feature map by applying at least one convolutional operation to the circumstance image;
(b2) the computing device, during a process of generating estimated Regions-Of-Interest(ROIs) on the circumstance image by applying at least one anchor operation to the convolutional feature map, instructing at least one anchor layer included in a Region Proposal Network(RPN) to generate each of one or more RPN confidence scores for each of pixels in the convolutional feature map, representing each of one or more probabilities of the estimated ROIs being same as Ground-Truth(GT) ROIs, to thereby generate an RPN confidence map including the RPN confidence scores;
(b3) the computing device, if at least one ROI-Pooled feature map is acquired, which has been generated by using the convolutional feature map and the estimated ROIs through an ROI pooling layer included in the CNN, during a process of generating an estimated object detection result by using the ROI-Pooled feature map, instructing an FC layer included in the CNN to generate each of CNN confidence scores for each of the estimated ROIs, representing each of one or more probabilities of each of one or more estimated CNN classification results and each of one or more estimated CNN regression results included in the estimated object detection result being same as each of one or more GT CNN classification results and each of one or more GT CNN regression results included in a GT object detection result, to thereby generate a CNN confidence map including the CNN confidence scores; and
(b4) the computing device instructing a confidence layer interworking with the CNN to generate an integrated confidence map by referring to the RPN confidence map and the CNN confidence map,
wherein the computing device instructs the CNN to output the initial object information including the estimated object detection result and the initial confidence information including the integrated confidence map.
15. A computing device for achieving better performance in an autonomous driving while saving computing powers, by using confidence scores representing a credibility of an object detection which is generated in parallel with an object detection process, comprising:
at least one memory that stores instructions; and
at least one processor configured to execute the instructions to perform processes of:
(I) acquiring at least one circumstance image on surroundings of a subject vehicle, through at least one panorama view sensor installed on the subject vehicle;
(II) instructing a Convolutional Neural Network(CNN) to apply at least one CNN operation to the circumstance image, to thereby generate initial object information and initial confidence information on the circumstance image; and
(III) generating final object information on the circumstance image by referring to the initial object information and the initial confidence information, with a support of a Reinforcement Learning(RL) agent;
wherein the process of (III) includes processes of:
(III-1) (i) first instructing the RL agent to select each of one or more first specific regions, to which a re-detection process is to be applied, among one or more regions in the circumstance image by referring to the initial confidence information and (ii) then instructing, as the re-detection process, the CNN to apply the CNN operation to the first specific regions, to thereby generate first adjusted object information and first adjusted confidence information on the circumstance image;
(III-2) iteratively, (i) instructing, as the re-detection process, the RL agent to select each of one or more K-th specific regions, to which the re-detection process is to be applied, among the regions in the circumstance image by referring to (K−1)-th adjusted confidence information generated beforehand and (ii) then instructing the CNN to apply the CNN operation to the K-th specific regions, to thereby generate K-th adjusted object information and K-th adjusted confidence information on the circumstance image; and
(III-3) if the re-detection process is performed N times so that N-th adjusted object information and N-th adjusted confidence information are generated, generating the final object information by referring to the initial object information and at least part of the first adjusted object information to the N-th adjusted object information, and wherein K is an integer from 2 to N, and N is the number of the re-detection process to be performed by determination of the RL agent.
16. The device ofclaim 15, wherein the RL agent generates M-th re-detection probability and one or more M-th specific regions by referring to (M−1)-th adjusted confidence information, and the processor instructs the CNN to apply the CNN operation to the M-th specific regions when the M-th re-detection probability is larger than a threshold, wherein M is an integer from 1 to N, and wherein, if M is 1, the RL agent generates a first re-detection probability and the first specific regions by referring to the initial confidence information.
17. The device ofclaim 16, wherein the RL agent, if the (M−1)-th adjusted confidence information is inputted thereto, (i) generates the M-th re-detection probability and the M-th specific regions by using its own parameters and the (M−1)_th adjusted confidence information, (ii) if the M-th re-detection probability is larger than the threshold, acquires M-th adjusted confidence information from the CNN, (iii) calculates an M-th original reward by referring to the (M−1)-th adjusted confidence information, the M-th adjusted confidence information and an M-th computational cost used for generating the M-th adjusted confidence information and its corresponding M-th adjusted object information, and (iv) trains said its own parameters by referring to the M-th original reward.
18. The device ofclaim 17, wherein the RL agent trains said its own parameters by using a gradient generated by using a following formula:
dRdO=1ϵ+O:R(O)>R(O)(R(O)-R(O))O:R(O)>R(O)(R(O)-R(O))ologp(OO)
wherein O is an M-th original vector including information on the M-th re-detection probability and the M-th specific regions, O′ is an M-th adjusted vector which has been generated by applying some noises to the M-th original vector, R(O) and R(O′) are respectively the M-th original reward and an M-th adjusted reward corresponding to the M-th adjusted vector, and E is a constant preventing a divergence of the gradient.
19. The device ofclaim 15, wherein the processor, if one or more L-th specific regions are selected while the re-detection process has been performed iteratively, instructs the RL agent to generate L-th magnification information corresponding to the L-th specific regions, generates L-th specific upsampled images by applying at least one upsampling operation to the L-th specific regions by referring to the L-th magnification information, and instructs the CNN to apply the CNN operation to the L-th specific upsampled images, to thereby generate L-th adjusted object information and L-th adjusted confidence information, wherein L is an integer from 1 to N.
20. A computing device for achieving better performance in an autonomous driving while saving computing powers, by using confidence scores representing a credibility of an object detection which is generated in parallel with an object detection process, comprising:
at least one memory that stores instructions; and
at least one processor configured to execute the instructions to perform processes of:
(I) acquiring at least one circumstance image on surroundings of a subject vehicle, through at least one panorama view sensor installed on the subject vehicle;
(II) instructing a Convolutional Neural Network(CNN) to apply at least one CNN operation to the circumstance image, to thereby generate initial object information and initial confidence information on the circumstance image; and
(III) generating final object information on the circumstance image by referring to the initial object information and the initial confidence information, with a support of a Reinforcement Learning(RL) agent;
wherein, at the process of (III), the processor, while generating an integrated image including a plurality of circumstance images as its partial images, generates integrated final object information on the integrated image by using each of pieces of initial object information and each of pieces of initial confidence information on each of the circumstance images; and
wherein the integrated final object information is generated by performing following processes of:
(III-4) (i) generating first integrated confidence information on the integrated image by referring to said each of pieces of the initial confidence information, (ii) then instructing the RL agent to select each of one or more first specific regions, among one or more regions in the integrated image, to which a re-detection process is to be applied, by referring to the first integrated confidence information, and (iii) finally instructing, as the re-detection process, the CNN to apply the CNN operation to the first specific regions, to thereby generate each of pieces of first adjusted object information and each of pieces of first adjusted confidence information on the integrated image;
(III-5) iteratively (i) generating K-th adjusted integrated confidence information on the integrated image by referring to each of pieces of (K−1)-th adjusted confidence information, (ii) then instructing the RL agent to select each of one or more K-th specific regions, among the regions in the integrated image, to which the re-detection process is to be applied, by referring to the K-th adjusted integrated confidence information, and (iii) finally instructing, as the re-detection process, the CNN to apply the CNN operation to the K-th specific regions, to thereby generate each of pieces of K-th adjusted object information and each of pieces of K-th adjusted confidence information on the integrated image; and
(III-6) if the re-detection process is performed N times so that each of pieces of N-th adjusted object information and each of pieces of N-th adjusted confidence information are generated, generating the integrated final object information by referring to the initial object information and at least part of the first adjusted object information to the N-th adjusted object information, and wherein K is an integer from 2 to N, and N is the number of the re-detection process to be performed by determination of the RL agent.
21. The device ofclaim 20, wherein (i) on condition that one or more first circumstance images are acquired through one or more first panorama view sensors, among a plurality of panorama view sensors, corresponding to a camera, the processor instructs a first CNN, which is optimized to a 3-channel image and whose number of input nodes is larger than or same as 3, to apply at least one first CNN operation to the first circumstance images, and (ii) on condition that one or more second circumstance images are acquired through one or more second panorama view sensors corresponding to a depth sensor, the processor instructs a second CNN, which is optimized to a depth image and whose number of input nodes is larger than or same as 1, to apply at least one second CNN operation to the second circumstance images.
22. A computing device for achieving better performance in an autonomous driving while saving computing powers, by using confidence scores representing a credibility of an object detection which is generated in parallel with an object detection process, comprising:
at least one memory that stores instructions; and
at least one processor configured to execute the instructions to perform processes of:
(I) acquiring at least one circumstance image on surroundings of a subject vehicle, through at least one panorama view sensor installed on the subject vehicle;
(II) instructing a Convolutional Neural Network(CNN) to apply at least one CNN operation to the circumstance image, to thereby generate initial object information and initial confidence information on the circumstance image; and
(III) generating final object information on the circumstance image by referring to the initial object information and the initial confidence information, with a support of a Reinforcement Learning(RL) agent;
wherein before the process of (I), the CNN is trained by performing processes of:
(I-1) a learning device, if a training image is acquired, instructing at least one convolutional layer included in the CNN to generate at least one convolutional feature map by applying at least one convolutional operation to the training image;
(I-2) the learning device, during a process of generating estimated Regions-Of-Interest(ROIs) on the training image by applying at least one anchor operation to the convolutional feature map, instructing at least one anchor layer included in a Region Proposal Network(RPN) to generate each of one or more RPN confidence scores for each of pixels in the convolutional feature map, representing each of one or more probabilities of the estimated ROIs being same as Ground-Truth(GT) ROIs, to thereby generate an RPN confidence map including the RPN confidence scores;
(I-3) the learning device, if at least one ROI-Pooled feature map is acquired, which has been generated by using the convolutional feature map and the estimated ROIs through an ROI pooling layer included in the CNN, during a process of generating an estimated object detection result by using the ROI-Pooled feature map, instructing an FC layer included in the CNN to generate each of CNN confidence scores for each of the estimated ROIs, representing each of one or more probabilities of each of one or more estimated CNN classification results and each of one or more estimated CNN regression results included in the estimated object detection result being same as each of one or more GT CNN classification results and each of one or more GT CNN regression results included in a GT object detection result, to thereby generate a CNN confidence map including the CNN confidence scores; and
(I-4) the learning device instructing a loss layer to generate at least one RPN loss and at least one CNN loss by referring to the RPN confidence map, the CNN confidence map, the estimated object detection result and the GT object detection result, and perform backpropagation by using the RPN loss and the CNN loss to learn at least part of parameters in the CNN and the RPN.
23. The device ofclaim 22, wherein, at the process of (I-4), the learning device instructs the loss layer to generate the RPN loss by using a following formula:
LRPN({pi},{ti})=1Ncls(i:pi*=I(pi>0.5)(1.1-ci)Lcls(pi,pi*)+i:pi*I(pi>0.5)(0.1+ci)Lcls(pi,pi*)+icilogci)+λ1Nregipi*Lreg(ti,ti*)
wherein Nregdenotes a constant corresponding to a size of the convolutional feature map generated by performing the anchor operation, Nclsdenotes a constant corresponding to the training image, cidenotes an i-th RPN confidence score corresponding to an i-th pixel of the convolutional feature map, among the RPN confidence scores, pidenotes an i-th estimated RPN classification result corresponding to the i-th pixel, pi* denotes an i-th GT RPN classification result corresponding to the i-th pixel, tidenotes an i-th estimated RPN regression result corresponding to the i-th pixel, and ti* denotes an i-th GT RPN regression result corresponding to the i-th pixel, and
wherein, the i-th GT RPN classification result and the i-th GT RPN regression result correspond to the GT object detection result.
24. The device ofclaim 22, wherein, at the process of (I-4), the learning device instructs the loss layer to generate the CNN loss by using a following formula:
LCNN({pi},{ti})=1Ncls(i:pi*=I(pi=largestscoreamongclassscores)(1.1-ci)Lcls(pi,pi*)+i:pi*I(pi=largestscoreamongclassscores)(0.1+ci)Lcls(pi,pi*)+icilogci)+λ1Nregipi*Lreg(ti,ti*)
wherein Nregand Nclsdenote the number of the estimated ROIs, cidenotes an i-th CNN confidence score, corresponding to an i-th estimated ROI selected from the estimated ROIs, among the CNN confidence scores, pidenotes an i-th estimated CNN classification result corresponding to the i-th estimated ROI, pi* denotes an i-th GT CNN classification result corresponding to the i-th pixel, tidenotes an i-th estimated CNN regression result corresponding to the i-th estimated ROI, and ti* denotes an i-th GT CNN regression result corresponding to the i-th pixel, and
wherein the i-th estimated CNN classification result and the i-th estimated CNN regression result correspond to the estimated object detection result.
25. The device ofclaim 22, wherein, after the process of (I-3), the learning device instructs a confidence layer to generate an integrated confidence map including information on each of integrated confidence scores for each of pixels in the training image by referring to the RPN confidence map and the CNN confidence map.
26. The device ofclaim 25, wherein the learning device instructs the confidence layer to perform (i-1) a process of acquiring a Non-Maximum Suppression(NMS) result on the estimated ROIs generated during a process of generating the estimated object detection result, from the CNN, (i-2) a process of generating a resized RPN confidence map by applying at least one resizing operation to the RPN confidence map, and (ii) a process of generating the integrated confidence map by referring to the NMS result and the resized RPN confidence map.
27. The device ofclaim 26, wherein the learning device instructs the confidence layer to generate an (X_Y)-th integrated confidence score, corresponding to a coordinate (x,y) on the training image, among the integrated confidence scores, by using a following formula:
cxy=max(cxy,maxri:(x,y)ricri)
wherein cxy′ denotes the (X_Y)-th integrated confidence score, cxydenotes an (X_Y)-th resized RPN confidence score corresponding to a coordinate (x,y) on the resized RPN confidence map, and cridenotes an i-th CNN confidence score for an i-th estimated ROI, denoted as ri, including the coordinate (x,y), which has been determined in the NMS result.
28. A computing device for achieving better performance in an autonomous driving while saving computing powers, by using confidence scores representing a credibility of an object detection which is generated in parallel with an object detection process, comprising:
at least one memory that stores instructions; and
at least one processor configured to execute the instructions to perform processes of:
(I) acquiring at least one circumstance image on surroundings of a subject vehicle, through at least one panorama view sensor installed on the subject vehicle;
(II) instructing a Convolutional Neural Network(CNN) to apply at least one CNN operation to the circumstance image, to thereby generate initial object information and initial confidence information on the circumstance image; and
(III) generating final object information on the circumstance image by referring to the initial object information and the initial confidence information, with a support of a Reinforcement Learning(RL) agent;
wherein the process of (II) includes processes of:
(II-1) if the circumstance image is acquired, instructing at least one convolutional layer included in the CNN to generate at least one convolutional feature map by applying at least one convolutional operation to the circumstance image;
(II-2) during a process of generating estimated Regions-Of-Interest(ROIs) on the circumstance image by applying at least one anchor operation to the convolutional feature map, instructing at least one anchor layer included in a Region Proposal Network(RPN) to generate each of one or more RPN confidence scores for each of pixels in the convolutional feature map, representing each of one or more probabilities of the estimated ROIs being same as Ground-Truth(GT) ROIs, to thereby generate an RPN confidence map including the RPN confidence scores;
(II-3) if at least one ROI-Pooled feature map is acquired, which has been generated by using the convolutional feature map and the estimated ROIs through an ROI pooling layer included in the CNN, during a process of generating an estimated object detection result by using the ROI-Pooled feature map, instructing an FC layer included in the CNN to generate each of CNN confidence scores for each of the estimated ROIs, representing each of one or more probabilities of each of one or more estimated CNN classification results and each of one or more estimated CNN regression results included in the estimated object detection result being same as each of one or more GT CNN classification results and each of one or more GT CNN regression results included in a GT object detection result, to thereby generate a CNN confidence map including the CNN confidence scores; and
(II-4) instructing a confidence layer interworking with the CNN to generate an integrated confidence map by referring to the RPN confidence map and the CNN confidence map, wherein the processor instructs the CNN to output the initial object information including the estimated object detection result and the initial confidence information including the integrated confidence map.
US16/739,7672019-01-312020-01-10Method and device for attention-driven resource allocation by using AVM and reinforcement learning to thereby achieve safety of autonomous drivingActiveUS10726279B1 (en)

Priority Applications (5)

Application NumberPriority DateFiling DateTitle
US16/739,767US10726279B1 (en)2019-01-312020-01-10Method and device for attention-driven resource allocation by using AVM and reinforcement learning to thereby achieve safety of autonomous driving
KR1020200007629AKR102396272B1 (en)2019-01-312020-01-20Method and device for attention-driven resource allocation by using avm and reinforcement learning to thereby achieve safety of autonomous driving
EP20152682.9AEP3690719A1 (en)2019-01-312020-01-20Method and device for attention-driven resource allocation by using avm and reinforcement learning to thereby achieve safety of autonoumous driving
CN202010073994.5ACN111507167B (en)2019-01-312020-01-22Method and device for realizing attention-driven resource allocation by AVM and reinforcement learning
JP2020008883AJP6895694B2 (en)2019-01-312020-01-23 Method and device for resource allocation using attention-driven algorithm to achieve safety of autonomous driving using AVM and reinforcement learning {METHOD AND DEVICE FOR ATTENTION-DRIVEN RESOURCE ALLOCATION BY USING AVMAND REINFORCEMENT LER ACHIEVE SAFETY OF AUTONOMOUS DRIVERING}

Applications Claiming Priority (2)

Application NumberPriority DateFiling DateTitle
US201962799321P2019-01-312019-01-31
US16/739,767US10726279B1 (en)2019-01-312020-01-10Method and device for attention-driven resource allocation by using AVM and reinforcement learning to thereby achieve safety of autonomous driving

Publications (2)

Publication NumberPublication Date
US10726279B1true US10726279B1 (en)2020-07-28
US20200250442A1 US20200250442A1 (en)2020-08-06

Family

ID=69185406

Family Applications (1)

Application NumberTitlePriority DateFiling Date
US16/739,767ActiveUS10726279B1 (en)2019-01-312020-01-10Method and device for attention-driven resource allocation by using AVM and reinforcement learning to thereby achieve safety of autonomous driving

Country Status (5)

CountryLink
US (1)US10726279B1 (en)
EP (1)EP3690719A1 (en)
JP (1)JP6895694B2 (en)
KR (1)KR102396272B1 (en)
CN (1)CN111507167B (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113096175A (en)*2021-03-242021-07-09苏州中科广视文化科技有限公司Depth map confidence estimation method based on convolutional neural network
US11403069B2 (en)2017-07-242022-08-02Tesla, Inc.Accelerated mathematical engine
US11409692B2 (en)2017-07-242022-08-09Tesla, Inc.Vector computational unit
US11487288B2 (en)2017-03-232022-11-01Tesla, Inc.Data synthesis for autonomous control systems
US11537811B2 (en)2018-12-042022-12-27Tesla, Inc.Enhanced object detection for autonomous vehicles based on field view
US20230008356A1 (en)*2021-07-092023-01-12Milestone Systems A/SVideo processing apparatus, method and computer program
US11561791B2 (en)2018-02-012023-01-24Tesla, Inc.Vector computational unit receiving data elements in parallel from a last row of a computational array
US11562231B2 (en)2018-09-032023-01-24Tesla, Inc.Neural networks for embedded devices
US11567514B2 (en)2019-02-112023-01-31Tesla, Inc.Autonomous and user controlled vehicle summon to a target
US11610117B2 (en)2018-12-272023-03-21Tesla, Inc.System and method for adapting a neural network model on a hardware platform
US11636333B2 (en)2018-07-262023-04-25Tesla, Inc.Optimizing neural network structures for embedded systems
US11665108B2 (en)2018-10-252023-05-30Tesla, Inc.QoS manager for system on a chip communications
US11681649B2 (en)2017-07-242023-06-20Tesla, Inc.Computational array microprocessor system using non-consecutive data formatting
US11734562B2 (en)2018-06-202023-08-22Tesla, Inc.Data pipeline and deep learning system for autonomous driving
US11748620B2 (en)2019-02-012023-09-05Tesla, Inc.Generating ground truth for machine learning from time series elements
US11790664B2 (en)2019-02-192023-10-17Tesla, Inc.Estimating object properties using visual image data
US11816585B2 (en)2018-12-032023-11-14Tesla, Inc.Machine learning models operating at different frequencies for autonomous vehicles
US11841434B2 (en)2018-07-202023-12-12Tesla, Inc.Annotation cross-labeling for autonomous control systems
US11893393B2 (en)2017-07-242024-02-06Tesla, Inc.Computational array microprocessor system with hardware arbiter managing memory requests
US11893774B2 (en)2018-10-112024-02-06Tesla, Inc.Systems and methods for training machine models with augmented data
US12014553B2 (en)2019-02-012024-06-18Tesla, Inc.Predicting three-dimensional features for autonomous driving
US12056917B2 (en)2019-10-022024-08-06Intelligent Dynamics, LlcDistributed management and control in autonomous conveyances
CN119649280A (en)*2025-02-182025-03-18东莞市三奕电子科技股份有限公司 A data processing method and system for video surveillance AI intelligent recognition equipment
US12307350B2 (en)2018-01-042025-05-20Tesla, Inc.Systems and methods for hardware-based pooling

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US12243240B2 (en)2020-11-132025-03-04Samsung Electronics Co., Ltd.Apparatus and method with multi-image panoptic segmentation
US12008816B2 (en)*2020-12-032024-06-11Hl Klemove Corp.Method and system for real time object detection
JP7361342B2 (en)*2021-03-242023-10-16パナソニックIpマネジメント株式会社 Learning methods, learning devices, and programs
KR102839049B1 (en)*2021-11-252025-07-28한국전자기술연구원System for detecting object based on depth image and method for generating and applying deep learning model thereof
CN115546768B (en)*2022-12-012023-04-07四川蜀道新能源科技发展有限公司Pavement marking identification method and system based on multi-scale mechanism and attention mechanism

Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20170206431A1 (en)*2016-01-202017-07-20Microsoft Technology Licensing, LlcObject detection and classification in images
US20180068198A1 (en)*2016-09-062018-03-08Carnegie Mellon UniversityMethods and Software for Detecting Objects in an Image Using Contextual Multiscale Fast Region-Based Convolutional Neural Network
US20180089505A1 (en)*2016-09-232018-03-29Samsung Electronics Co., Ltd.System and method for deep network fusion for fast and robust object detection
US20180129974A1 (en)*2016-11-042018-05-10United Technologies CorporationControl systems using deep reinforcement learning
US20180267558A1 (en)*2017-03-142018-09-20Starsky Robotics, Inc.Vehicle sensor system and method of use
US20180268292A1 (en)*2017-03-172018-09-20Nec Laboratories America, Inc.Learning efficient object detection models with knowledge distillation
US20190258878A1 (en)*2018-02-182019-08-22Nvidia CorporationObject detection and detection confidence suitable for autonomous driving
US20190354782A1 (en)*2018-05-172019-11-21Uber Technologies, Inc.Object Detection and Property Determination for Autonomous Vehicles

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5961571A (en)*1994-12-271999-10-05Siemens Corporated Research, IncMethod and apparatus for automatically tracking the location of vehicles
KR101449295B1 (en)*2013-06-042014-10-13현대자동차주식회사Apparatus and method for tracking parking-lot
US10037469B2 (en)*2013-12-102018-07-31Google LlcImage location through large object detection
US9767381B2 (en)*2015-09-222017-09-19Xerox CorporationSimilarity-based detection of prominent objects using deep CNN pooling layers as features
US9965719B2 (en)*2015-11-042018-05-08Nec CorporationSubcategory-aware convolutional neural networks for object detection
JP2018005739A (en)*2016-07-062018-01-11株式会社デンソーMethod for learning reinforcement of neural network and reinforcement learning device
US10380741B2 (en)*2016-12-072019-08-13Samsung Electronics Co., LtdSystem and method for a deep learning machine for object detection
CN106599939A (en)*2016-12-302017-04-26深圳市唯特视科技有限公司Real-time target detection method based on region convolutional neural network
US20180211403A1 (en)*2017-01-202018-07-26Ford Global Technologies, LlcRecurrent Deep Convolutional Neural Network For Object Detection
CN108052881A (en)*2017-11-302018-05-18华中科技大学The method and apparatus of multiclass entity object in a kind of real-time detection construction site image
CN108596329B (en)*2018-05-112020-08-07北方民族大学Three-dimensional model classification method based on end-to-end deep ensemble learning network
CN109145898A (en)*2018-07-262019-01-04清华大学深圳研究生院A kind of object detecting method based on convolutional neural networks and iterator mechanism

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20170206431A1 (en)*2016-01-202017-07-20Microsoft Technology Licensing, LlcObject detection and classification in images
US20180068198A1 (en)*2016-09-062018-03-08Carnegie Mellon UniversityMethods and Software for Detecting Objects in an Image Using Contextual Multiscale Fast Region-Based Convolutional Neural Network
US20180089505A1 (en)*2016-09-232018-03-29Samsung Electronics Co., Ltd.System and method for deep network fusion for fast and robust object detection
US20180129974A1 (en)*2016-11-042018-05-10United Technologies CorporationControl systems using deep reinforcement learning
US20180267558A1 (en)*2017-03-142018-09-20Starsky Robotics, Inc.Vehicle sensor system and method of use
US20180268292A1 (en)*2017-03-172018-09-20Nec Laboratories America, Inc.Learning efficient object detection models with knowledge distillation
US20190258878A1 (en)*2018-02-182019-08-22Nvidia CorporationObject detection and detection confidence suitable for autonomous driving
US20190354782A1 (en)*2018-05-172019-11-21Uber Technologies, Inc.Object Detection and Property Determination for Autonomous Vehicles

Cited By (40)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US11487288B2 (en)2017-03-232022-11-01Tesla, Inc.Data synthesis for autonomous control systems
US12020476B2 (en)2017-03-232024-06-25Tesla, Inc.Data synthesis for autonomous control systems
US11893393B2 (en)2017-07-242024-02-06Tesla, Inc.Computational array microprocessor system with hardware arbiter managing memory requests
US11409692B2 (en)2017-07-242022-08-09Tesla, Inc.Vector computational unit
US12216610B2 (en)2017-07-242025-02-04Tesla, Inc.Computational array microprocessor system using non-consecutive data formatting
US12086097B2 (en)2017-07-242024-09-10Tesla, Inc.Vector computational unit
US11403069B2 (en)2017-07-242022-08-02Tesla, Inc.Accelerated mathematical engine
US11681649B2 (en)2017-07-242023-06-20Tesla, Inc.Computational array microprocessor system using non-consecutive data formatting
US12307350B2 (en)2018-01-042025-05-20Tesla, Inc.Systems and methods for hardware-based pooling
US11561791B2 (en)2018-02-012023-01-24Tesla, Inc.Vector computational unit receiving data elements in parallel from a last row of a computational array
US11797304B2 (en)2018-02-012023-10-24Tesla, Inc.Instruction set architecture for a vector computational unit
US11734562B2 (en)2018-06-202023-08-22Tesla, Inc.Data pipeline and deep learning system for autonomous driving
US11841434B2 (en)2018-07-202023-12-12Tesla, Inc.Annotation cross-labeling for autonomous control systems
US11636333B2 (en)2018-07-262023-04-25Tesla, Inc.Optimizing neural network structures for embedded systems
US12079723B2 (en)2018-07-262024-09-03Tesla, Inc.Optimizing neural network structures for embedded systems
US11562231B2 (en)2018-09-032023-01-24Tesla, Inc.Neural networks for embedded devices
US12346816B2 (en)2018-09-032025-07-01Tesla, Inc.Neural networks for embedded devices
US11983630B2 (en)2018-09-032024-05-14Tesla, Inc.Neural networks for embedded devices
US11893774B2 (en)2018-10-112024-02-06Tesla, Inc.Systems and methods for training machine models with augmented data
US11665108B2 (en)2018-10-252023-05-30Tesla, Inc.QoS manager for system on a chip communications
US11816585B2 (en)2018-12-032023-11-14Tesla, Inc.Machine learning models operating at different frequencies for autonomous vehicles
US12367405B2 (en)2018-12-032025-07-22Tesla, Inc.Machine learning models operating at different frequencies for autonomous vehicles
US11537811B2 (en)2018-12-042022-12-27Tesla, Inc.Enhanced object detection for autonomous vehicles based on field view
US11908171B2 (en)2018-12-042024-02-20Tesla, Inc.Enhanced object detection for autonomous vehicles based on field view
US12198396B2 (en)2018-12-042025-01-14Tesla, Inc.Enhanced object detection for autonomous vehicles based on field view
US11610117B2 (en)2018-12-272023-03-21Tesla, Inc.System and method for adapting a neural network model on a hardware platform
US12136030B2 (en)2018-12-272024-11-05Tesla, Inc.System and method for adapting a neural network model on a hardware platform
US12014553B2 (en)2019-02-012024-06-18Tesla, Inc.Predicting three-dimensional features for autonomous driving
US12223428B2 (en)2019-02-012025-02-11Tesla, Inc.Generating ground truth for machine learning from time series elements
US11748620B2 (en)2019-02-012023-09-05Tesla, Inc.Generating ground truth for machine learning from time series elements
US11567514B2 (en)2019-02-112023-01-31Tesla, Inc.Autonomous and user controlled vehicle summon to a target
US12164310B2 (en)2019-02-112024-12-10Tesla, Inc.Autonomous and user controlled vehicle summon to a target
US12236689B2 (en)2019-02-192025-02-25Tesla, Inc.Estimating object properties using visual image data
US11790664B2 (en)2019-02-192023-10-17Tesla, Inc.Estimating object properties using visual image data
US12056917B2 (en)2019-10-022024-08-06Intelligent Dynamics, LlcDistributed management and control in autonomous conveyances
CN113096175A (en)*2021-03-242021-07-09苏州中科广视文化科技有限公司Depth map confidence estimation method based on convolutional neural network
CN113096175B (en)*2021-03-242023-10-24苏州中科广视文化科技有限公司Depth map confidence estimation method based on convolutional neural network
US20230008356A1 (en)*2021-07-092023-01-12Milestone Systems A/SVideo processing apparatus, method and computer program
EP4125002A3 (en)*2021-07-092023-06-21Milestone Systems A/SA video processing apparatus, method and computer program
CN119649280A (en)*2025-02-182025-03-18东莞市三奕电子科技股份有限公司 A data processing method and system for video surveillance AI intelligent recognition equipment

Also Published As

Publication numberPublication date
KR102396272B1 (en)2022-05-10
CN111507167A (en)2020-08-07
EP3690719A1 (en)2020-08-05
JP2020126633A (en)2020-08-20
CN111507167B (en)2023-09-29
US20200250442A1 (en)2020-08-06
KR20200095376A (en)2020-08-10
JP6895694B2 (en)2021-06-30

Similar Documents

PublicationPublication DateTitle
US10726279B1 (en)Method and device for attention-driven resource allocation by using AVM and reinforcement learning to thereby achieve safety of autonomous driving
US11010668B2 (en)Method and device for attention-driven resource allocation by using reinforcement learning and V2X communication to thereby achieve safety of autonomous driving
US10692002B1 (en)Learning method and learning device of pedestrian detector for robust surveillance based on image analysis by using GAN and testing method and testing device using the same
EP3690742B1 (en)Method for auto-labeling training images for use in deep learning network to analyze images with high precision, and auto-labeling device using the same
US10776673B2 (en)Learning method and learning device for sensor fusion to integrate information acquired by radar capable of distance estimation and information acquired by camera to thereby improve neural network for supporting autonomous driving, and testing method and testing device using the same
US10373027B1 (en)Method for acquiring sample images for inspecting label among auto-labeled images to be used for learning of neural network and sample image acquiring device using the same
US10776647B2 (en)Method and device for attention-driven resource allocation by using AVM to thereby achieve safety of autonomous driving
US10410120B1 (en)Learning method and testing method of object detector to be used for surveillance based on R-CNN capable of converting modes according to aspect ratios or scales of objects, and learning device and testing device using the same
US10509987B1 (en)Learning method and learning device for object detector based on reconfigurable network for optimizing customers' requirements such as key performance index using target object estimating network and target object merging network, and testing method and testing device using the same
US10726303B1 (en)Learning method and learning device for switching modes of autonomous vehicle based on on-device standalone prediction to thereby achieve safety of autonomous driving, and testing method and testing device using the same
US10650279B1 (en)Learning method and learning device for heterogeneous sensor fusion by using merging network which learns non-maximum suppression
EP3686837B1 (en)Learning method and learning device for reducing distortion occurred in warped image generated in process of stabilizing jittered image by using gan to enhance fault tolerance and fluctuation robustness in extreme situations
US10402686B1 (en)Learning method and learning device for object detector to be used for surveillance based on convolutional neural network capable of converting modes according to scales of objects, and testing method and testing device using the same

Legal Events

DateCodeTitleDescription
FEPPFee payment procedure

Free format text:ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

ASAssignment

Owner name:STRADVISION, INC., KOREA, REPUBLIC OF

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KIM, KYE-HYEON;KIM, YONGJOONG;KIM, HAK-KYOUNG;AND OTHERS;REEL/FRAME:051609/0107

Effective date:20191213

FEPPFee payment procedure

Free format text:ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STCFInformation on status: patent grant

Free format text:PATENTED CASE

MAFPMaintenance fee payment

Free format text:PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment:4


[8]ページ先頭

©2009-2025 Movatter.jp