Movatterモバイル変換


[0]ホーム

URL:


US20190126472A1 - Reinforcement and imitation learning for a task - Google Patents

Reinforcement and imitation learning for a task
Download PDF

Info

Publication number
US20190126472A1
US20190126472A1US16/174,112US201816174112AUS2019126472A1US 20190126472 A1US20190126472 A1US 20190126472A1US 201816174112 AUS201816174112 AUS 201816174112AUS 2019126472 A1US2019126472 A1US 2019126472A1
Authority
US
United States
Prior art keywords
neural network
task
agent
data
commands
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US16/174,112
Inventor
Saran Tunyasuvunakool
Yuke Zhu
Joshua Merel
Janos Kramar
Ziyu Wang
Nicolas Manfred Otto Heess
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gdm Holding LLC
Original Assignee
DeepMind Technologies Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DeepMind Technologies LtdfiledCriticalDeepMind Technologies Ltd
Priority to US16/174,112priorityCriticalpatent/US20190126472A1/en
Assigned to DEEPMIND TECHNOLOGIES LIMITEDreassignmentDEEPMIND TECHNOLOGIES LIMITEDASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: MEREL, JOSHUA, HEESS, NICOLAS MANFRED OTTO, ZHU, YUKE, KRAMAR, JANOS, TUNYASUVUNAKOOL, SARAN, WANG, ZIYU
Publication of US20190126472A1publicationCriticalpatent/US20190126472A1/en
Priority to US18/306,711prioritypatent/US12343874B2/en
Assigned to GDM HOLDING LLCreassignmentGDM HOLDING LLCASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: DEEPMIND TECHNOLOGIES LIMITED
Abandonedlegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

A neural network control system for controlling an agent to perform a task in a real-world environment, operates based on both image data and proprioceptive data describing the configuration of the agent. The training of the control system includes both imitation learning, using datasets generated from previous performances of the task, and reinforcement learning, based on rewards calculated from control data output by the control system.

Description

Claims (20)

What is claimed is:
1. A computer-implemented method of training a neural network to generate commands for controlling an agent to perform a task in an environment, the method comprising:
obtaining, for each of a plurality of performances of the task, a respective dataset characterizing the corresponding performance of the task; and
using the dataset, training a neural network to generate commands for controlling the agent based on image data encoding captured images of the environment and proprioceptive data comprising one or more variables describing configurations of the agent;
wherein the training the neural network comprises:
using the neural network to generate a plurality of sets of one or more commands,
for each set of commands generating at least one corresponding reward value indicative of how successfully the task is carried out upon implementation of the set of commands by the agent, and
adjusting one or more parameters of the neural network based on the datasets, the sets of commands and the corresponding reward values.
2. The method ofclaim 1 in which adjusting the one or more parameters of the neural network comprises adjusting the neural network based on a hybrid energy function, the hybrid energy function including both an imitation reward value derived using the datasets and the generated sets of commands, and task reward term calculated using the generated reward values.
3. The method of theclaim 2 including using the datasets to generate a discriminator network, and deriving the imitation reward value using the discriminator network and the sets of one or more commands.
4. The method ofclaim 3 in which the discriminator network receives data characterizing the positions of objects in the environment.
5. The method ofclaim 1, in which the reward value is generated by computationally simulating a process carried out by the agent in the environment based on the corresponding set of commands to generate a final state of the environment, and calculating an initial reward value based at least on the final state of the environment.
6. The method ofclaim 5, in which updates to the neural network are calculated using an activation function estimator obtained by subtracting a value function from the initial reward value, and the initial reward value is calculated according to a task reward function based on the final state of the environment.
7. The method ofclaim 6 in which the value function is calculated using data characterizing the positions of objects in the environment.
8. The method ofclaim 6 in which the value function is calculated by an adaptive model.
9. The method ofclaim 1, in which the neural network comprises a convolutional neural network which receives the image data and from it generates convolved data, the neural network further comprising at least one adaptive component which receives the output of the convolutional neural network and the proprioceptive data.
10. The method according toclaim 9 in which the adaptive component is a perceptron.
11. The method ofclaim 9 in which the neural network further comprises a recursive neural network, which receives input data generated both from the image data and the proprioceptive data.
12. The method ofclaim 9, further including defining at least one auxiliary task, and training the convolutional network as part of an adaptive system which is trained to perform the auxiliary task based on image data.
13. The method ofclaim 1, in which the training of the neural network is performed in parallel with the training of a plurality of additional instances of the neural network by respective workers, the adjustment of the parameters of the neural network being additionally based on reward values indicative of how successfully the task is carried out by simulated agents based on sets of commands generated by the additional neural networks.
14. The method ofclaim 1, in which the step of using the neural network to generate a plurality of sets of commands is performed at least once by supplying to the neural network image data and proprioceptive data which characterizes a state associated with one of the performances of the task.
15. The method ofclaim 1, further comprising, prior to training the neural network, defining a plurality of stages of the task, and for each stage of the task defining a respective plurality of initial states,
the step of using the neural network to generate a plurality of sets of commands being performed at least once, for each task stage, by supplying to the neural network image data and proprioceptive data which characterizes one of the corresponding plurality of initial states.
16. A method of performing a task, the method comprising:
training a neural network to generate commands for controlling an agent to perform the task in an environment, by a method according to any preceding claim; and
a plurality of times performing the steps of:
(i) capturing images of an environment and generating image data encoding the images;
(ii) capturing proprioceptive data comprising one or more variables describing configurations of the agent;
(iii) transmitting the image data and the proprioceptive data to the neural network, the neural network generating at least one command based on the image data and the proprioceptive data; and
(iv) transmitting the command to the agent, the agent being operative to perform the command within the environment;
whereby the neural network successively generates a sequence of commands to control the agent to perform the task.
17. The method ofclaim 16 in which the step of obtaining, for each of a plurality of performances of the task, a respective dataset characterizing the corresponding performance of the task, is performed by controlling the agent to perform the task a plurality of times, and for each performance generating a respective dataset characterizing the performance.
18. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising:
obtaining, for each of a plurality of performances of the task, a respective dataset characterizing the corresponding performance of the task; and
using the dataset, training a neural network to generate commands for controlling the agent based on image data encoding captured images of the environment and proprioceptive data comprising one or more variables describing configurations of the agent
wherein the training the neural network comprises:
using the neural network to generate a plurality of sets of one or more commands,
for each set of commands generating at least one corresponding reward value indicative of how successfully the task is carried out upon implementation of the set of commands by the agent, and
adjusting one or more parameters of the neural network based on the datasets, the sets of commands and the corresponding reward values.
19. The system ofclaim 18 further including: an agent operative to perform commands generated by the neural network; at least one image capture device operative to capture images of an environment and generate image data encoding the images; and at least one device operative to capture proprioceptive data comprising the one or more variables describing configurations of the agent.
20. (canceled)
US16/174,1122017-10-272018-10-29Reinforcement and imitation learning for a taskAbandonedUS20190126472A1 (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
US16/174,112US20190126472A1 (en)2017-10-272018-10-29Reinforcement and imitation learning for a task
US18/306,711US12343874B2 (en)2017-10-272023-04-25Reinforcement and imitation learning for a task

Applications Claiming Priority (2)

Application NumberPriority DateFiling DateTitle
US201762578368P2017-10-272017-10-27
US16/174,112US20190126472A1 (en)2017-10-272018-10-29Reinforcement and imitation learning for a task

Related Child Applications (1)

Application NumberTitlePriority DateFiling Date
US18/306,711ContinuationUS12343874B2 (en)2017-10-272023-04-25Reinforcement and imitation learning for a task

Publications (1)

Publication NumberPublication Date
US20190126472A1true US20190126472A1 (en)2019-05-02

Family

ID=64082957

Family Applications (2)

Application NumberTitlePriority DateFiling Date
US16/174,112AbandonedUS20190126472A1 (en)2017-10-272018-10-29Reinforcement and imitation learning for a task
US18/306,711Active2039-07-14US12343874B2 (en)2017-10-272023-04-25Reinforcement and imitation learning for a task

Family Applications After (1)

Application NumberTitlePriority DateFiling Date
US18/306,711Active2039-07-14US12343874B2 (en)2017-10-272023-04-25Reinforcement and imitation learning for a task

Country Status (3)

CountryLink
US (2)US20190126472A1 (en)
EP (1)EP3480741B1 (en)
CN (1)CN109726813A (en)

Cited By (42)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20180253512A1 (en)*2017-02-152018-09-06Michael Alexander GreenNovel system and method for achieving functional coverage closure for electronic system verification
US20190182069A1 (en)*2017-12-122019-06-13Distech Controls Inc.Environment controller and method for inferring one or more commands for controlling an appliance taking into account room characteristics
US20190179270A1 (en)*2017-12-122019-06-13Distech Controls Inc.Inference server and environment controller for inferring one or more commands for controlling an appliance taking into account room characteristics
US20190179268A1 (en)*2017-12-122019-06-13Distech Controls Inc.Inference server and environment controller for inferring via a neural network one or more commands for controlling an appliance
US20190308315A1 (en)*2018-04-042019-10-10Kuka Deutschland GmbhProcess Control Using Deep Learning Training Model
US20200164517A1 (en)*2018-11-272020-05-28Kindred Systems Inc.Systems and methods for robotic grasp verification
US10845768B2 (en)*2017-12-122020-11-24Distech Controls Inc.Environment controller and method for inferring via a neural network one or more commands for controlling an appliance
EP3742344A1 (en)*2019-05-212020-11-25Robert Bosch GmbHComputer-implemented method of and apparatus for training a neural network
CN112016678A (en)*2019-09-232020-12-01南京地平线机器人技术有限公司Training method and device for strategy generation network for reinforcement learning and electronic equipment
WO2021004435A1 (en)*2019-07-062021-01-14Huawei Technologies Co., Ltd.Method and system for training reinforcement learning agent using adversarial sampling
WO2021009293A1 (en)*2019-07-172021-01-21Deepmind Technologies LimitedTraining a neural network to control an agent using task-relevant adversarial imitation learning
CN112338921A (en)*2020-11-162021-02-09西华师范大学 A fast training method for intelligent control of robotic arm based on deep reinforcement learning
US20210073674A1 (en)*2019-09-112021-03-11International Business Machines CorporationAutomated explainer of reinforcement learning actions using occupation measures
US20210122038A1 (en)*2018-06-282021-04-29Siemens AktiengesellschaftMethod and device for ascertaining control parameters in a computer-assisted manner for a favorable action of a technical system
US20210146549A1 (en)*2019-11-192021-05-20Industrial Technology Research InstituteGripping device and gripping method
US20210174209A1 (en)*2019-09-292021-06-10Huawei Technologies Co., Ltd.Neural network obtaining method and related device
CN113467515A (en)*2021-07-222021-10-01南京大学Unmanned aerial vehicle flight control method based on virtual environment simulation reconstruction and reinforcement learning
CN113825171A (en)*2021-09-302021-12-21新华三技术有限公司Network congestion control method, device, equipment and medium
CN113935232A (en)*2021-09-172022-01-14北京控制工程研究所 A system and method for learning and training strategies for getting out of trouble in dangerous scenes of extraterrestrial surfaces
US11227090B2 (en)*2017-02-152022-01-18Michael Alexander GreenSystem and method for achieving functional coverage closure for electronic system verification
CN113962012A (en)*2021-07-232022-01-21中国科学院自动化研究所Unmanned aerial vehicle countermeasure strategy optimization method and device
US20220051106A1 (en)*2020-08-122022-02-17Inventec (Pudong) Technology CorporationMethod for training virtual animal to move based on control parameters
US11294891B2 (en)*2019-04-252022-04-05Adobe Inc.Interactive search experience using machine learning
CN114800515A (en)*2022-05-122022-07-29四川大学Robot assembly motion planning method based on demonstration track
CN114897058A (en)*2022-04-222022-08-12清华大学 Auxiliary learning method, device and storage medium for joint selection of tasks and data
US20220305646A1 (en)*2021-03-272022-09-29Mitsubishi Electric Research Laboratories, Inc.Simulation-in-the-loop Tuning of Robot Parameters for System Modeling and Control
US11460209B2 (en)*2019-08-262022-10-04Distech Controls Inc.Environment controller and method for generating a predictive model of a neural network through distributed reinforcement learning
US11491650B2 (en)2018-12-192022-11-08Abb Schweiz AgDistributed inference multi-models for industrial applications
US20220379476A1 (en)*2019-12-032022-12-01Siemens AktiengesellschaftComputerized engineering tool and methodology to develop neural skills for a robotics system
US11604941B1 (en)*2017-10-272023-03-14Deepmind Technologies LimitedTraining action-selection neural networks from demonstrations using multiple losses
US11615695B2 (en)2018-06-122023-03-28Intergraph CorporationCoverage agent for computer-aided dispatch systems
US20230107460A1 (en)*2021-10-052023-04-06Deepmind Technologies LimitedCompositional generalization for reinforcement learning
EP4276708A1 (en)*2022-05-132023-11-15Robert Bosch GmbHApparatus and computer-implemented method for providing a trained policy configured to control a device, apparatus and method for controlling a device, and vehicle
CN117172280A (en)*2023-11-012023-12-05四川酷盼科技有限公司Multisource data processing method applied to bionic animal
US20230409903A1 (en)*2019-10-072023-12-21Waymo LlcMulti-agent simulations
US11861482B2 (en)2019-08-262024-01-02Distech Controls Inc.Training server and method for generating a predictive model of a neural network through distributed reinforcement learning
US12091042B2 (en)2021-08-022024-09-17Ford Global Technologies, LlcMethod and system for training an autonomous vehicle motion planning model
US12124537B2 (en)2022-01-032024-10-22International Business Machines CorporationTraining an environment generator of a generative adversarial network (GAN) to generate realistic environments that incorporate reinforcement learning (RL) algorithm feedback
US12140917B2 (en)2018-03-072024-11-12Distech Controls Inc.Training server and method for generating a predictive model for controlling an appliance
WO2024256330A1 (en)*2023-06-162024-12-19Robert Bosch GmbhMethod for training a control strategy for a technical system
US12208521B1 (en)*2021-01-202025-01-28University Of Southern CaliforniaSystem and method for robot learning from human demonstrations with formal logic
US12217153B2 (en)2019-08-262025-02-04Distech Controls Inc.Training server and method for generating a predictive model of a neural network through distributed reinforcement learning

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
GB201906551D0 (en)*2019-05-092019-06-26Microsoft Technology Licensing LlcTraining behavior of an agent
US20220343141A1 (en)*2019-05-282022-10-27Telefonaktiebolaget Lm Ericsson (Publ)Cavity filter tuning using imitation and reinforcement learning
CN113939779B (en)*2019-06-282024-06-14欧姆龙株式会社 Method and device for operating an automation system, automation system and computer-readable storage medium
CN114051444B (en)*2019-07-012024-04-26库卡德国有限公司 Executing an application with the aid of at least one robot
US11373108B2 (en)*2019-07-102022-06-28Microsoft Technology Licensing, LlcReinforcement learning in real-time communications
CN110991027A (en)*2019-11-272020-04-10华南理工大学Robot simulation learning method based on virtual scene training
CN111687840B (en)*2020-06-112021-10-29清华大学 A method, device and storage medium for capturing space targets
CN112784958B (en)*2020-12-312023-05-23中电海康集团有限公司Household service type robot based on continuous learning method
CN113379027A (en)*2021-02-242021-09-10中国海洋大学Method, system, storage medium and application for generating confrontation interactive simulation learning
DE112022003562T5 (en)*2021-12-152024-05-29Nvidia Corporation MACHINE LEARNING THROUGH DIFFERENTIABLE SIMULATION
CN114240144B (en)*2021-12-162024-12-24国网宁夏电力有限公司 Power system dynamic economic dispatch system and method based on generative adversarial imitation learning

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9679258B2 (en)*2013-10-082017-06-13Google Inc.Methods and apparatus for reinforcement learning
EP3326114B1 (en)*2015-07-242024-09-04DeepMind Technologies LimitedContinuous control with deep reinforcement learning
US10089576B2 (en)2015-07-282018-10-02Microsoft Technology Licensing, LlcRepresentation learning using multi-task deep neural networks
DE202016004627U1 (en)*2016-07-272016-09-23Google Inc. Training a neural value network
CN116992917A (en)*2016-10-102023-11-03渊慧科技有限公司System and method for selecting actions
US20180165602A1 (en)2016-12-142018-06-14Microsoft Technology Licensing, LlcScalability of reinforcement learning by separation of concerns
US11651208B2 (en)*2017-05-192023-05-16Deepmind Technologies LimitedTraining action selection neural networks using a differentiable credit function
CN110799992B (en)*2017-09-202023-09-12谷歌有限责任公司 Using simulation and domain adaptation for robot control
US10926408B1 (en)*2018-01-122021-02-23Amazon Technologies, Inc.Artificial intelligence system for efficiently learning robotic control policies
JP7551895B2 (en)*2020-07-282024-09-17ディープマインド テクノロジーズ リミテッド Offline Learning for Robot Control Using Reward Prediction Models

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Nair et al. Overcoming Exploration in Reinforcement Learning with Demonstrations. 28 September 2017. arXiv. [retrieved from the internet on 2022-11-19] <URL: https://arxiv.org/abs/1709.10089v1> (Year: 2017)*
HM Biu et al. Using grayscale images for object recognition with convolutional-recursive neural network. July 2016. [retrieved from internet on 2022-11-19] <URL: https://ieeexplore.ieee.org/abstract/document/7562656> (Year: 2016)*

Cited By (65)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10699046B2 (en)*2017-02-152020-06-30Michael Alexander GreenSystem and method for achieving functional coverage closure for electronic system verification
US20180253512A1 (en)*2017-02-152018-09-06Michael Alexander GreenNovel system and method for achieving functional coverage closure for electronic system verification
US11227090B2 (en)*2017-02-152022-01-18Michael Alexander GreenSystem and method for achieving functional coverage closure for electronic system verification
US12008077B1 (en)*2017-10-272024-06-11Deepmind Technologies LimitedTraining action-selection neural networks from demonstrations using multiple losses
US11604941B1 (en)*2017-10-272023-03-14Deepmind Technologies LimitedTraining action-selection neural networks from demonstrations using multiple losses
US12228891B2 (en)2017-12-122025-02-18Distech Controls Inc.Environment controller and method for inferring via a neural network one or more commands for controlling an appliance
US20190182069A1 (en)*2017-12-122019-06-13Distech Controls Inc.Environment controller and method for inferring one or more commands for controlling an appliance taking into account room characteristics
US10838375B2 (en)*2017-12-122020-11-17Distech Controls Inc.Inference server and environment controller for inferring via a neural network one or more commands for controlling an appliance
US10845768B2 (en)*2017-12-122020-11-24Distech Controls Inc.Environment controller and method for inferring via a neural network one or more commands for controlling an appliance
US11543786B2 (en)2017-12-122023-01-03Distech Controls Inc.Inference server and environment controller for inferring via a neural network one or more commands for controlling an appliance
US11526138B2 (en)2017-12-122022-12-13Distech Controls Inc.Environment controller and method for inferring via a neural network one or more commands for controlling an appliance
US11747771B2 (en)2017-12-122023-09-05Distech Controls Inc.Inference server and environment controller for inferring one or more commands for controlling an appliance taking into account room characteristics
US11754983B2 (en)2017-12-122023-09-12Distech Controls Inc.Environment controller and method for inferring one or more commands for controlling an appliance taking into account room characteristics
US10895853B2 (en)*2017-12-122021-01-19Distech Controls Inc.Inference server and environment controller for inferring one or more commands for controlling an appliance taking into account room characteristics
US20190179270A1 (en)*2017-12-122019-06-13Distech Controls Inc.Inference server and environment controller for inferring one or more commands for controlling an appliance taking into account room characteristics
US10908561B2 (en)*2017-12-122021-02-02Distech Controls Inc.Environment controller and method for inferring one or more commands for controlling an appliance taking into account room characteristics
US12259696B2 (en)2017-12-122025-03-25Distech Controls Inc.Environment controller and method for inferring one or more commands for controlling an appliance taking into account room characteristics
US12242232B2 (en)2017-12-122025-03-04Distech Controls Inc.Inference server and environment controller for inferring via a neural network one or more commands for controlling an appliance
US20190179268A1 (en)*2017-12-122019-06-13Distech Controls Inc.Inference server and environment controller for inferring via a neural network one or more commands for controlling an appliance
US12140917B2 (en)2018-03-072024-11-12Distech Controls Inc.Training server and method for generating a predictive model for controlling an appliance
US10875176B2 (en)*2018-04-042020-12-29Kuka Systems North America LlcProcess control using deep learning training model
US20190308315A1 (en)*2018-04-042019-10-10Kuka Deutschland GmbhProcess Control Using Deep Learning Training Model
US12106657B2 (en)2018-06-122024-10-01Intergraph CorporationSimilarity agent for computer-aided dispatch systems
US11615695B2 (en)2018-06-122023-03-28Intergraph CorporationCoverage agent for computer-aided dispatch systems
US11735028B2 (en)2018-06-122023-08-22Intergraph CorporationArtificial intelligence applications for computer-aided dispatch systems
US12125368B2 (en)2018-06-122024-10-22Intergraph CorporationStatistic agent for computer-aided dispatch systems
US20210122038A1 (en)*2018-06-282021-04-29Siemens AktiengesellschaftMethod and device for ascertaining control parameters in a computer-assisted manner for a favorable action of a technical system
US12285866B2 (en)*2018-06-282025-04-29Siemens AktiengesellschaftMethod and device for ascertaining control parameters in a computer-assisted manner for a favorable action of a technical system
US20200164517A1 (en)*2018-11-272020-05-28Kindred Systems Inc.Systems and methods for robotic grasp verification
US11839983B2 (en)*2018-11-272023-12-12Ocado Innovation LimitedSystems and methods for robotic grasp verification
US11491650B2 (en)2018-12-192022-11-08Abb Schweiz AgDistributed inference multi-models for industrial applications
US11294891B2 (en)*2019-04-252022-04-05Adobe Inc.Interactive search experience using machine learning
US11971884B2 (en)2019-04-252024-04-30Adobe Inc.Interactive search experience using machine learning
EP3742344A1 (en)*2019-05-212020-11-25Robert Bosch GmbHComputer-implemented method of and apparatus for training a neural network
WO2021004435A1 (en)*2019-07-062021-01-14Huawei Technologies Co., Ltd.Method and system for training reinforcement learning agent using adversarial sampling
US11994862B2 (en)2019-07-062024-05-28Huawei Technologies Co., Ltd.Method and system for training reinforcement learning agent using adversarial sampling
WO2021009293A1 (en)*2019-07-172021-01-21Deepmind Technologies LimitedTraining a neural network to control an agent using task-relevant adversarial imitation learning
US11861482B2 (en)2019-08-262024-01-02Distech Controls Inc.Training server and method for generating a predictive model of a neural network through distributed reinforcement learning
US12217153B2 (en)2019-08-262025-02-04Distech Controls Inc.Training server and method for generating a predictive model of a neural network through distributed reinforcement learning
US11460209B2 (en)*2019-08-262022-10-04Distech Controls Inc.Environment controller and method for generating a predictive model of a neural network through distributed reinforcement learning
US20210073674A1 (en)*2019-09-112021-03-11International Business Machines CorporationAutomated explainer of reinforcement learning actions using occupation measures
CN112488307A (en)*2019-09-112021-03-12国际商业机器公司Automated interpretation of reinforcement learning actions using occupancy measures
CN112016678A (en)*2019-09-232020-12-01南京地平线机器人技术有限公司Training method and device for strategy generation network for reinforcement learning and electronic equipment
US20210174209A1 (en)*2019-09-292021-06-10Huawei Technologies Co., Ltd.Neural network obtaining method and related device
US20230409903A1 (en)*2019-10-072023-12-21Waymo LlcMulti-agent simulations
US20210146549A1 (en)*2019-11-192021-05-20Industrial Technology Research InstituteGripping device and gripping method
US12115680B2 (en)*2019-12-032024-10-15Siemens AktiengesellschaftComputerized engineering tool and methodology to develop neural skills for a robotics system
US20220379476A1 (en)*2019-12-032022-12-01Siemens AktiengesellschaftComputerized engineering tool and methodology to develop neural skills for a robotics system
US20220051106A1 (en)*2020-08-122022-02-17Inventec (Pudong) Technology CorporationMethod for training virtual animal to move based on control parameters
CN112338921A (en)*2020-11-162021-02-09西华师范大学 A fast training method for intelligent control of robotic arm based on deep reinforcement learning
US12208521B1 (en)*2021-01-202025-01-28University Of Southern CaliforniaSystem and method for robot learning from human demonstrations with formal logic
US11975451B2 (en)*2021-03-272024-05-07Mitsubishi Electric Research Laboratories, Inc.Simulation-in-the-loop tuning of robot parameters for system modeling and control
US20220305646A1 (en)*2021-03-272022-09-29Mitsubishi Electric Research Laboratories, Inc.Simulation-in-the-loop Tuning of Robot Parameters for System Modeling and Control
CN113467515A (en)*2021-07-222021-10-01南京大学Unmanned aerial vehicle flight control method based on virtual environment simulation reconstruction and reinforcement learning
CN113962012A (en)*2021-07-232022-01-21中国科学院自动化研究所Unmanned aerial vehicle countermeasure strategy optimization method and device
US12091042B2 (en)2021-08-022024-09-17Ford Global Technologies, LlcMethod and system for training an autonomous vehicle motion planning model
CN113935232A (en)*2021-09-172022-01-14北京控制工程研究所 A system and method for learning and training strategies for getting out of trouble in dangerous scenes of extraterrestrial surfaces
CN113825171A (en)*2021-09-302021-12-21新华三技术有限公司Network congestion control method, device, equipment and medium
US20230107460A1 (en)*2021-10-052023-04-06Deepmind Technologies LimitedCompositional generalization for reinforcement learning
US12124537B2 (en)2022-01-032024-10-22International Business Machines CorporationTraining an environment generator of a generative adversarial network (GAN) to generate realistic environments that incorporate reinforcement learning (RL) algorithm feedback
CN114897058A (en)*2022-04-222022-08-12清华大学 Auxiliary learning method, device and storage medium for joint selection of tasks and data
CN114800515A (en)*2022-05-122022-07-29四川大学Robot assembly motion planning method based on demonstration track
EP4276708A1 (en)*2022-05-132023-11-15Robert Bosch GmbHApparatus and computer-implemented method for providing a trained policy configured to control a device, apparatus and method for controlling a device, and vehicle
WO2024256330A1 (en)*2023-06-162024-12-19Robert Bosch GmbhMethod for training a control strategy for a technical system
CN117172280A (en)*2023-11-012023-12-05四川酷盼科技有限公司Multisource data processing method applied to bionic animal

Also Published As

Publication numberPublication date
EP3480741A1 (en)2019-05-08
US20230330848A1 (en)2023-10-19
CN109726813A (en)2019-05-07
EP3480741B1 (en)2024-07-17
US12343874B2 (en)2025-07-01

Similar Documents

PublicationPublication DateTitle
US12343874B2 (en)Reinforcement and imitation learning for a task
US11341364B2 (en)Using simulation and domain adaptation for robotic control
US10635944B2 (en)Self-supervised robotic object interaction
US12353993B2 (en)Domain adaptation for robotic control using self-supervised learning
US20240394540A1 (en)Neural networks for scalable continual learning in domains with sequentially learned tasks
US20230256593A1 (en)Off-line learning for robot control using a reward prediction model
US11951622B2 (en)Domain adaptation using simulation to simulation transfer
US20230330846A1 (en)Cross-domain imitation learning using goal conditioned policies
US12325130B2 (en)Data-driven robot control
EP3610418A1 (en)Distributional reinforcement learning
EP3888014A1 (en)Controlling robots using entropy constraints
EP3935566B1 (en)Unsupervised learning of object keypoint locations in images through temporal transport or spatio-temporal transport
US12061964B2 (en)Modulating agent behavior to optimize learning progress
WO2024163992A1 (en)Controlling agents using q-transformer neural networks
US20240412063A1 (en)Demonstration-driven reinforcement learning
WO2024242207A1 (en)System and method for controlling a mechanical system with multiple degrees of freedom

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:DEEPMIND TECHNOLOGIES LIMITED, UNITED KINGDOM

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TUNYASUVUNAKOOL, SARAN;ZHU, YUKE;MEREL, JOSHUA;AND OTHERS;SIGNING DATES FROM 20171122 TO 20171214;REEL/FRAME:047375/0099

STPPInformation on status: patent application and granting procedure in general

Free format text:APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED

STPPInformation on status: patent application and granting procedure in general

Free format text:DOCKETED NEW CASE - READY FOR EXAMINATION

STPPInformation on status: patent application and granting procedure in general

Free format text:NON FINAL ACTION MAILED

STPPInformation on status: patent application and granting procedure in general

Free format text:RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPPInformation on status: patent application and granting procedure in general

Free format text:FINAL REJECTION MAILED

STPPInformation on status: patent application and granting procedure in general

Free format text:DOCKETED NEW CASE - READY FOR EXAMINATION

STPPInformation on status: patent application and granting procedure in general

Free format text:NON FINAL ACTION MAILED

STCBInformation on status: application discontinuation

Free format text:ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

ASAssignment

Owner name:GDM HOLDING LLC, CALIFORNIA

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:DEEPMIND TECHNOLOGIES LIMITED;REEL/FRAME:071550/0092

Effective date:20250612


[8]ページ先頭

©2009-2025 Movatter.jp