Movatterモバイル変換


[0]ホーム

URL:


US20170032245A1 - Systems and Methods for Providing Reinforcement Learning in a Deep Learning System - Google Patents

Systems and Methods for Providing Reinforcement Learning in a Deep Learning System
Download PDF

Info

Publication number
US20170032245A1
US20170032245A1US15/212,042US201615212042AUS2017032245A1US 20170032245 A1US20170032245 A1US 20170032245A1US 201615212042 AUS201615212042 AUS 201615212042AUS 2017032245 A1US2017032245 A1US 2017032245A1
Authority
US
United States
Prior art keywords
data
state
action
network
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/212,042
Inventor
Ian David Moffat Osband
Benjamin Van Roy
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Leland Stanford Junior University
Original Assignee
Leland Stanford Junior University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leland Stanford Junior UniversityfiledCriticalLeland Stanford Junior University
Priority to US15/212,042priorityCriticalpatent/US20170032245A1/en
Publication of US20170032245A1publicationCriticalpatent/US20170032245A1/en
Priority to US16/576,697prioritypatent/US20200065672A1/en
Assigned to THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITYreassignmentTHE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITYASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: Osband, Ian David Moffat, VAN ROY, BENJAMIN
Abandonedlegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

Systems and methods for providing reinforcement learning for a deep learning network are disclosed. A reinforcement learning process that provides deep exploration is provided by a bootstrap that applied to a sample of observed and artificial data to facilitate deep exploration via a Thompson sampling approach.

Description

Claims (24)

What is claimed is:
1. A deep learning system comprising:
at least one processor;
memory accessible by each at least one processor;
instructions that when read by the at least one processor direct the at least one processor to:
maintain a deep neural network; and
apply a reinforcement learning process to the deep neural network where the reinforcement learning process includes:
receive a set of observed data and a set artificial data,
for each of one or more episodes:
sample from a set of data that is a union of the set of observed data and the set of artificial data to generate set of training data;
determine a state-action value function for the set of training data using a bootstrap process and an approximator where the approximator that estimates a state-action function for a dataset;
for each time step in each one or more episode:
determine a state of the system for a current time step from the set of training data;
select an action based on the determined state of the system and a policy mapping actions to the state of the system;
determine results for the action including a reward and a transition state that result from the selected action; and
store result data for the current time step that includes the state, the action, the transition state, and
update the set of the observed data with the result data from at least one time step for each of the one or more the episodes.
2. The deep learning system ofclaim 1 wherein the instructions further direct the at least one processor to generate the set of artificial data from the set of observed data.
3. The deep learning system ofclaim 2 wherein the instructions to generate the artificial data include instruction that direct the at least one processor:
sample the set of observed data with replacement to generate the set of artificial data.
4. The deep learning system ofclaim 2 wherein the instructions to generate the artificial data include instructions that direct the at least one processor to:
sample a plurality state-action pairs from a diffusely mixed generative model; and
assign each of the plurality of sampled state-action pairs stochastically optimistic rewards and random state transitions.
5. The deep learning system ofclaim 1 wherein the instructions further direct the at least one processor to:
maintain a training mask that indicates the result data from each of the time period in each episode to be used in training; and
wherein the updating of the set of observed data includes adding the result data from each time period of an episode indicated in the training mask.
6. The deep learning network ofclaim 1 where the instructions further direct the processor to:
receive the approximator as an input.
7. The deep learning network ofclaim 1 wherein the instructions further direct the processor to:
read the approximator from memory.
8. The deep learning network ofclaim 1 wherein the approximator is a neural network trained to fit a state-action value function to the data set via a least squared iteration.
9. The deep learning network ofclaim 1 wherein a plurality of reinforcement learning processes are applied to the deep neural network.
10. The deep learning network ofclaim 9 wherein each of the plurality of reinforcement learning processes independently maintain the set of observed data.
11. The deep learning network ofclaim 9 wherein the plurality of reinforcement learning processes cooperatively maintain the set of observed data.
12. The deep learning process ofclaim 9 wherein the instruction further direct the processor to:
maintain a bootstrap mask that indicates each element in the set of observed data that is available to each of the plurality of reinforcement learning process.
13. A method performed by at least one processor executing instructions stored in memory to perform the method to provide reinforcement learning in a deep learning network, the method comprising:
receiving a set of observed data and a set artificial data;
for each of one or more episodes:
sampling from a set of data that is a union of the set of observed data and the set of artificial data to generate set of training data,
determining a state-action value function for the set of training data using a bootstrap process and an approximator where the approximator that estimates a state-action function for a dataset,
for each time step in each one or more episode:
determining a state of the system for a current time step from the set of training data;
selecting an action based on the determined state of the system and a policy mapping actions to the state of the system;
determining results for the action including a reward and a transition state that result from the selected action; and
storing result data for the current time step that includes the state, the action, the transition state, and
updating the set of the observed data with the result data from at least one time step of each of the one or more episodes.
14. The method ofclaim 13 further comprising generating the set of artificial data from the set of observed data.
15. The method ofclaim 14 further comprising:
sampling the set of observed data with replacement to generate the set of artificial data.
16. The method ofclaim 14 further comprising:
sampling a plurality state-action pairs from a diffusely mixed generative model; and
assigning each of the plurality of sampled state-action pairs stochastically optimistic rewards and random state transitions.
17. The method ofclaim 13 further comprising:
maintaining a training mask that indicates the result data from each of the time period in each episode to be used in training; and
wherein the updating of the set of observed data includes adding the result data from each time period of an episode indicated in the training mask.
18. The method ofclaim 13 further comprising:
receiving the approximator as an input.
19. The method ofclaim 13 further comprising:
read the approximator from memory.
20. The method ofclaim 13 wherein the approximator is a neural network trained to fit a state-action value function to the data set via a least squared iteration.
21. The method ofclaim 13 wherein a plurality of reinforcement learning methods are applied to the deep neural network.
22. The method ofclaim 21 wherein each of the plurality of reinforcement learning methods independently maintain the set of observed data.
23. The method ofclaim 21 wherein the plurality of reinforcement learning methods cooperatively maintain the set of observed data.
24. The method ofclaim 21 further comprising:
maintaining a bootstrap mask that indicates each element in the set of observed data that is available to each of the plurality of reinforcement learning process.
US15/212,0422015-07-012016-07-15Systems and Methods for Providing Reinforcement Learning in a Deep Learning SystemAbandonedUS20170032245A1 (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
US15/212,042US20170032245A1 (en)2015-07-012016-07-15Systems and Methods for Providing Reinforcement Learning in a Deep Learning System
US16/576,697US20200065672A1 (en)2015-07-012019-09-19Systems and Methods for Providing Reinforcement Learning in a Deep Learning System

Applications Claiming Priority (3)

Application NumberPriority DateFiling DateTitle
US201562187681P2015-07-012015-07-01
US201615201284A2016-07-012016-07-01
US15/212,042US20170032245A1 (en)2015-07-012016-07-15Systems and Methods for Providing Reinforcement Learning in a Deep Learning System

Related Parent Applications (1)

Application NumberTitlePriority DateFiling Date
US201615201284AContinuation-In-Part2015-07-012016-07-01

Related Child Applications (1)

Application NumberTitlePriority DateFiling Date
US16/576,697ContinuationUS20200065672A1 (en)2015-07-012019-09-19Systems and Methods for Providing Reinforcement Learning in a Deep Learning System

Publications (1)

Publication NumberPublication Date
US20170032245A1true US20170032245A1 (en)2017-02-02

Family

ID=57882627

Family Applications (2)

Application NumberTitlePriority DateFiling Date
US15/212,042AbandonedUS20170032245A1 (en)2015-07-012016-07-15Systems and Methods for Providing Reinforcement Learning in a Deep Learning System
US16/576,697PendingUS20200065672A1 (en)2015-07-012019-09-19Systems and Methods for Providing Reinforcement Learning in a Deep Learning System

Family Applications After (1)

Application NumberTitlePriority DateFiling Date
US16/576,697PendingUS20200065672A1 (en)2015-07-012019-09-19Systems and Methods for Providing Reinforcement Learning in a Deep Learning System

Country Status (1)

CountryLink
US (2)US20170032245A1 (en)

Cited By (63)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108460405A (en)*2018-02-022018-08-28上海大学A kind of image latent writing analysis Ensemble classifier optimization method based on deeply study
JP2018142060A (en)*2017-02-272018-09-13株式会社東芝 Isolation management system and isolation management method
WO2018205778A1 (en)*2017-05-112018-11-15苏州大学张家港工业技术研究院Large-range monitoring method based on deep weighted double-q learning and monitoring robot
WO2018210430A1 (en)*2017-05-192018-11-22Telefonaktiebolaget Lm Ericsson (Publ)Training a software agent to control an environment
CN109344877A (en)*2018-08-312019-02-15深圳先进技术研究院 A sample data processing method, sample data processing device and electronic equipment
CN109347149A (en)*2018-09-202019-02-15国网河南省电力公司电力科学研究院 Microgrid energy storage scheduling method and device based on deep Q-value network reinforcement learning
US10210860B1 (en)2018-07-272019-02-19Deepgram, Inc.Augmented generalized deep learning with special vocabulary
CN109597876A (en)*2018-11-072019-04-09中山大学A kind of more wheels dialogue answer preference pattern and its method based on intensified learning
CN109711239A (en)*2018-09-112019-05-03重庆邮电大学 Visual Attention Detection Method Based on Improved Hybrid Incremental Dynamic Bayesian Network
CN109976909A (en)*2019-03-182019-07-05中南大学Low delay method for scheduling task in edge calculations network based on study
US20190385091A1 (en)*2018-06-152019-12-19International Business Machines CorporationReinforcement learning exploration by exploiting past experiences for critical events
CN110598906A (en)*2019-08-152019-12-20珠海米枣智能科技有限公司Method and system for controlling energy consumption of superstores in real time based on deep reinforcement learning
CN110753936A (en)*2017-08-252020-02-04谷歌有限责任公司Batch reinforcement learning
CN111152227A (en)*2020-01-192020-05-15聊城鑫泰机床有限公司Mechanical arm control method based on guided DQN control
CN111160755A (en)*2019-12-262020-05-15西北工业大学DQN-based real-time scheduling method for aircraft overhaul workshop
CN111226235A (en)*2018-01-172020-06-02华为技术有限公司 Method for generating training data for training a neural network, method for training a neural network, and method for autonomous operation using a neural network
CN111258909A (en)*2020-02-072020-06-09中国信息安全测评中心Test sample generation method and device
US10701439B2 (en)2018-01-042020-06-30Samsung Electronics Co., Ltd.Electronic apparatus and controlling method of thereof
US20200249675A1 (en)*2019-01-312020-08-06StradVision, Inc.Method and device for providing personalized and calibrated adaptive deep learning model for the user of an autonomous vehicle
US10789511B2 (en)*2018-10-122020-09-29Deepmind Technologies LimitedControlling agents over long time scales using temporal value transport
US10818019B2 (en)2017-08-142020-10-27Siemens Healthcare GmbhDilated fully convolutional network for multi-agent 2D/3D medical image registration
CN111936988A (en)*2018-04-042020-11-13北京嘀嘀无限科技发展有限公司Intelligent incentive distribution
CN111950873A (en)*2020-07-302020-11-17上海卫星工程研究所Satellite real-time guiding task planning method and system based on deep reinforcement learning
CN112084680A (en)*2020-09-022020-12-15沈阳工程学院 An energy internet optimization strategy method based on DQN algorithm
US10872294B2 (en)*2018-09-272020-12-22Deepmind Technologies LimitedImitation learning using a generative predecessor neural network
CN112183762A (en)*2020-09-152021-01-05上海交通大学Reinforced learning method based on mixed behavior space
CN112272831A (en)*2018-05-182021-01-26渊慧科技有限公司Reinforcement learning system including a relationship network for generating data encoding relationships between entities in an environment
CN112334914A (en)*2018-09-272021-02-05渊慧科技有限公司Mock learning using generative lead neural networks
US10984507B2 (en)2019-07-172021-04-20Harris Geospatial Solutions, Inc.Image processing system including training model based upon iterative blurring of geospatial images and related methods
CN112734014A (en)*2021-01-122021-04-30山东大学Experience playback sampling reinforcement learning method and system based on confidence upper bound thought
CN112836974A (en)*2021-02-052021-05-25上海海事大学 A Dynamic Scheduling Method for Multi-Field Bridges in Box Intervals Based on DQN and MCTS
US20210200743A1 (en)*2019-12-302021-07-01Ensemble Rcm, LlcValidation of data in a database record using a reinforcement learning algorithm
US11068748B2 (en)2019-07-172021-07-20Harris Geospatial Solutions, Inc.Image processing system including training model based upon iteratively biased loss function and related methods
US20210224685A1 (en)*2020-01-212021-07-22Walmart Apollo, LlcRobust reinforcement learning in personalized content prediction
CN113162850A (en)*2021-01-132021-07-23中国科学院计算技术研究所Artificial intelligence-based heterogeneous network multi-path scheduling method and system
CN113261016A (en)*2018-11-052021-08-13诺基亚通信公司Single-shot multi-user multiple-input multiple-output (MU-MIMO) resource pairing using Deep Q Network (DQN) based reinforcement learning
CN113268933A (en)*2021-06-182021-08-17大连理工大学Rapid structural parameter design method of S-shaped emergency robot based on reinforcement learning
CN113544703A (en)*2019-03-052021-10-22易享信息技术有限公司 Efficient off-policy credit allocation
US11164077B2 (en)*2017-11-022021-11-02Siemens AktiengesellschaftRandomized reinforcement learning for control of complex systems
US11182676B2 (en)2017-08-042021-11-23International Business Machines CorporationCooperative neural network deep reinforcement learning with partial input assistance
US11188797B2 (en)*2018-10-302021-11-30International Business Machines CorporationImplementing artificial intelligence agents to perform machine learning tasks using predictive analytics to leverage ensemble policies for maximizing long-term returns
US11204761B2 (en)2018-12-032021-12-21International Business Machines CorporationData center including cognitive agents and related methods
CN113923308A (en)*2021-10-152022-01-11浙江工业大学 Predictive outbound call task assignment method and outbound call system based on deep reinforcement learning
CN114138416A (en)*2021-12-032022-03-04福州大学 Load-time window-oriented adaptive allocation method of cloud software resources based on DQN
CN114161419A (en)*2021-12-132022-03-11大连理工大学Robot operation skill efficient learning method guided by scene memory
US20220083842A1 (en)*2020-08-282022-03-17Tata Consultancy Services LimitedOptimal policy learning and recommendation for distribution task using deep reinforcement learning model
CN114331754A (en)*2021-12-232022-04-12重庆大学 A cloud manufacturing service composition method based on multi-strategy deep reinforcement learning
US20220183748A1 (en)*2020-12-162022-06-16Biosense Webster (Israel) Ltd.Accurate tissue proximity
US20220215269A1 (en)*2018-02-062022-07-07Cognizant Technology Solutions U.S. CorporationEnhancing Evolutionary Optimization in Uncertain Environments By Allocating Evaluations Via Multi-Armed Bandit Algorithms
US11417087B2 (en)2019-07-172022-08-16Harris Geospatial Solutions, Inc.Image processing system including iteratively biased training model probability distribution function and related methods
US11449763B2 (en)*2018-03-072022-09-20Adobe Inc.Making resource-constrained sequential recommendations
US11461703B2 (en)2019-01-232022-10-04International Business Machines CorporationDeterminantal reinforced learning in artificial intelligence
US11557036B2 (en)2016-05-182023-01-17Siemens Healthcare GmbhMethod and system for image registration using an intelligent artificial agent
US11568236B2 (en)2018-01-252023-01-31The Research Foundation For The State University Of New YorkFramework and methods of diverse exploration for fast and safe policy improvement
US20230072777A1 (en)*2021-07-162023-03-09Tata Consultancy Services LimitedBudget constrained deep q-network for dynamic campaign allocation in computational advertising
US11656775B2 (en)2018-08-072023-05-23Marvell Asia Pte, Ltd.Virtualizing isolation areas of solid-state storage media
WO2023111700A1 (en)*2021-12-152023-06-22International Business Machines CorporationReinforcement learning under constraints
US11693601B2 (en)2018-08-072023-07-04Marvell Asia Pte, Ltd.Enabling virtual functions on storage media
US11790399B2 (en)2020-01-212023-10-17Walmart Apollo, LlcDynamic evaluation and use of global and contextual personas
US11823039B2 (en)2018-08-242023-11-21International Business Machines CorporationSafe and fast exploration for reinforcement learning using constrained action manifolds
US11853901B2 (en)2019-07-262023-12-26Samsung Electronics Co., Ltd.Learning method of AI model and electronic apparatus
CN118395829A (en)*2024-03-082024-07-26南京邮电大学 A two-level decision-making guidance method for electric vehicles in traffic electrification coupled systems
US12443678B2 (en)2021-12-152025-10-14International Business Machines CorporationStepwise uncertainty-aware offline reinforcement learning under constraints

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10396919B1 (en)2017-05-122019-08-27Virginia Tech Intellectual Properties, Inc.Processing of communications signals using machine learning
US11640516B2 (en)*2020-06-032023-05-02International Business Machines CorporationDeep evolved strategies with reinforcement
WO2023288309A1 (en)*2021-07-152023-01-19Regents Of The University Of MinnesotaSystems and methods for controlling a medical device using bayesian preference model based optimization and validation

Cited By (90)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US11557036B2 (en)2016-05-182023-01-17Siemens Healthcare GmbhMethod and system for image registration using an intelligent artificial agent
US11741605B2 (en)2016-05-182023-08-29Siemens Healthcare GmbhMethod and system for image registration using an intelligent artificial agent
US12094116B2 (en)2016-05-182024-09-17Siemens Healthineers AgMethod and system for image registration using an intelligent artificial agent
JP2018142060A (en)*2017-02-272018-09-13株式会社東芝 Isolation management system and isolation management method
WO2018205778A1 (en)*2017-05-112018-11-15苏州大学张家港工业技术研究院Large-range monitoring method based on deep weighted double-q learning and monitoring robot
US11224970B2 (en)*2017-05-112022-01-18Soochow UniversityLarge area surveillance method and surveillance robot based on weighted double deep Q-learning
WO2018210430A1 (en)*2017-05-192018-11-22Telefonaktiebolaget Lm Ericsson (Publ)Training a software agent to control an environment
US11182676B2 (en)2017-08-042021-11-23International Business Machines CorporationCooperative neural network deep reinforcement learning with partial input assistance
US11354813B2 (en)2017-08-142022-06-07Siemens Healthcare GmbhDilated fully convolutional network for 2D/3D medical image registration
US10818019B2 (en)2017-08-142020-10-27Siemens Healthcare GmbhDilated fully convolutional network for multi-agent 2D/3D medical image registration
CN110753936A (en)*2017-08-252020-02-04谷歌有限责任公司Batch reinforcement learning
US11164077B2 (en)*2017-11-022021-11-02Siemens AktiengesellschaftRandomized reinforcement learning for control of complex systems
US10701439B2 (en)2018-01-042020-06-30Samsung Electronics Co., Ltd.Electronic apparatus and controlling method of thereof
CN111226235A (en)*2018-01-172020-06-02华为技术有限公司 Method for generating training data for training a neural network, method for training a neural network, and method for autonomous operation using a neural network
US11568236B2 (en)2018-01-252023-01-31The Research Foundation For The State University Of New YorkFramework and methods of diverse exploration for fast and safe policy improvement
CN108460405A (en)*2018-02-022018-08-28上海大学A kind of image latent writing analysis Ensemble classifier optimization method based on deeply study
US20220215269A1 (en)*2018-02-062022-07-07Cognizant Technology Solutions U.S. CorporationEnhancing Evolutionary Optimization in Uncertain Environments By Allocating Evaluations Via Multi-Armed Bandit Algorithms
US11995559B2 (en)*2018-02-062024-05-28Cognizant Technology Solutions U.S. CorporationEnhancing evolutionary optimization in uncertain environments by allocating evaluations via multi-armed bandit algorithms
US11449763B2 (en)*2018-03-072022-09-20Adobe Inc.Making resource-constrained sequential recommendations
CN111936988A (en)*2018-04-042020-11-13北京嘀嘀无限科技发展有限公司Intelligent incentive distribution
CN112272831A (en)*2018-05-182021-01-26渊慧科技有限公司Reinforcement learning system including a relationship network for generating data encoding relationships between entities in an environment
US20190385091A1 (en)*2018-06-152019-12-19International Business Machines CorporationReinforcement learning exploration by exploiting past experiences for critical events
US20210035565A1 (en)*2018-07-272021-02-04Deepgram, Inc.Deep learning internal state index-based search and classification
US11367433B2 (en)2018-07-272022-06-21Deepgram, Inc.End-to-end neural networks for speech recognition and classification
US11676579B2 (en)*2018-07-272023-06-13Deepgram, Inc.Deep learning internal state index-based search and classification
US10210860B1 (en)2018-07-272019-02-19Deepgram, Inc.Augmented generalized deep learning with special vocabulary
US20200035224A1 (en)*2018-07-272020-01-30Deepgram, Inc.Deep learning internal state index-based search and classification
US10720151B2 (en)2018-07-272020-07-21Deepgram, Inc.End-to-end neural networks for speech recognition and classification
US10847138B2 (en)*2018-07-272020-11-24Deepgram, Inc.Deep learning internal state index-based search and classification
US10540959B1 (en)2018-07-272020-01-21Deepgram, Inc.Augmented generalized deep learning with special vocabulary
US10380997B1 (en)*2018-07-272019-08-13Deepgram, Inc.Deep learning internal state index-based search and classification
US11693601B2 (en)2018-08-072023-07-04Marvell Asia Pte, Ltd.Enabling virtual functions on storage media
US11656775B2 (en)2018-08-072023-05-23Marvell Asia Pte, Ltd.Virtualizing isolation areas of solid-state storage media
US11823039B2 (en)2018-08-242023-11-21International Business Machines CorporationSafe and fast exploration for reinforcement learning using constrained action manifolds
CN109344877A (en)*2018-08-312019-02-15深圳先进技术研究院 A sample data processing method, sample data processing device and electronic equipment
CN109711239A (en)*2018-09-112019-05-03重庆邮电大学 Visual Attention Detection Method Based on Improved Hybrid Incremental Dynamic Bayesian Network
CN109347149A (en)*2018-09-202019-02-15国网河南省电力公司电力科学研究院 Microgrid energy storage scheduling method and device based on deep Q-value network reinforcement learning
US10872294B2 (en)*2018-09-272020-12-22Deepmind Technologies LimitedImitation learning using a generative predecessor neural network
CN112334914A (en)*2018-09-272021-02-05渊慧科技有限公司Mock learning using generative lead neural networks
US10789511B2 (en)*2018-10-122020-09-29Deepmind Technologies LimitedControlling agents over long time scales using temporal value transport
JP2022504739A (en)*2018-10-122022-01-13ディープマインド テクノロジーズ リミテッド Controlling agents over long timescales using time value transfer
KR102719425B1 (en)*2018-10-122024-10-21딥마인드 테크놀로지스 리미티드 Agent control over long time scales using temporal value transport (TVT)
US11769049B2 (en)2018-10-122023-09-26Deepmind Technologies LimitedControlling agents over long time scales using temporal value transport
JP7139524B2 (en)2018-10-122022-09-20ディープマインド テクノロジーズ リミテッド Control agents over long timescales using time value transfer
KR20210053970A (en)*2018-10-122021-05-12딥마인드 테크놀로지스 리미티드 Agent control for long time scale using temporal value transport (TVT)
CN112840359A (en)*2018-10-122021-05-25渊慧科技有限公司 Control agents on long time scales by using time value passing
US11188797B2 (en)*2018-10-302021-11-30International Business Machines CorporationImplementing artificial intelligence agents to perform machine learning tasks using predictive analytics to leverage ensemble policies for maximizing long-term returns
US12040856B2 (en)2018-11-052024-07-16Nokia Solutions And Networks OyOne shot multi-user multiple-input multiple-output (MU-MIMO) resource pairing using reinforcement learning based deep Q network (DQN)
CN113261016A (en)*2018-11-052021-08-13诺基亚通信公司Single-shot multi-user multiple-input multiple-output (MU-MIMO) resource pairing using Deep Q Network (DQN) based reinforcement learning
CN109597876A (en)*2018-11-072019-04-09中山大学A kind of more wheels dialogue answer preference pattern and its method based on intensified learning
US11204761B2 (en)2018-12-032021-12-21International Business Machines CorporationData center including cognitive agents and related methods
US11461703B2 (en)2019-01-232022-10-04International Business Machines CorporationDeterminantal reinforced learning in artificial intelligence
US20200249675A1 (en)*2019-01-312020-08-06StradVision, Inc.Method and device for providing personalized and calibrated adaptive deep learning model for the user of an autonomous vehicle
US10824151B2 (en)*2019-01-312020-11-03StradVision, Inc.Method and device for providing personalized and calibrated adaptive deep learning model for the user of an autonomous vehicle
CN113544703A (en)*2019-03-052021-10-22易享信息技术有限公司 Efficient off-policy credit allocation
CN109976909A (en)*2019-03-182019-07-05中南大学Low delay method for scheduling task in edge calculations network based on study
US10984507B2 (en)2019-07-172021-04-20Harris Geospatial Solutions, Inc.Image processing system including training model based upon iterative blurring of geospatial images and related methods
US11068748B2 (en)2019-07-172021-07-20Harris Geospatial Solutions, Inc.Image processing system including training model based upon iteratively biased loss function and related methods
US11417087B2 (en)2019-07-172022-08-16Harris Geospatial Solutions, Inc.Image processing system including iteratively biased training model probability distribution function and related methods
US11853901B2 (en)2019-07-262023-12-26Samsung Electronics Co., Ltd.Learning method of AI model and electronic apparatus
CN110598906A (en)*2019-08-152019-12-20珠海米枣智能科技有限公司Method and system for controlling energy consumption of superstores in real time based on deep reinforcement learning
CN111160755A (en)*2019-12-262020-05-15西北工业大学DQN-based real-time scheduling method for aircraft overhaul workshop
US20210200743A1 (en)*2019-12-302021-07-01Ensemble Rcm, LlcValidation of data in a database record using a reinforcement learning algorithm
CN111152227A (en)*2020-01-192020-05-15聊城鑫泰机床有限公司Mechanical arm control method based on guided DQN control
US12154135B2 (en)2020-01-212024-11-26Walmart Apollo, LlcDynamic evaluation and use of global and contextual personas
US20210224685A1 (en)*2020-01-212021-07-22Walmart Apollo, LlcRobust reinforcement learning in personalized content prediction
US11790399B2 (en)2020-01-212023-10-17Walmart Apollo, LlcDynamic evaluation and use of global and contextual personas
US11645580B2 (en)*2020-01-212023-05-09Walmart Apollo, LlcRobust reinforcement learning in personalized content prediction
CN111258909A (en)*2020-02-072020-06-09中国信息安全测评中心Test sample generation method and device
CN111950873A (en)*2020-07-302020-11-17上海卫星工程研究所Satellite real-time guiding task planning method and system based on deep reinforcement learning
US20220083842A1 (en)*2020-08-282022-03-17Tata Consultancy Services LimitedOptimal policy learning and recommendation for distribution task using deep reinforcement learning model
US12093805B2 (en)*2020-08-282024-09-17Tata Consultancy Services LimitedOptimal policy learning and recommendation for distribution task using deep reinforcement learning model
CN112084680A (en)*2020-09-022020-12-15沈阳工程学院 An energy internet optimization strategy method based on DQN algorithm
CN112183762A (en)*2020-09-152021-01-05上海交通大学Reinforced learning method based on mixed behavior space
US20220183748A1 (en)*2020-12-162022-06-16Biosense Webster (Israel) Ltd.Accurate tissue proximity
CN112734014A (en)*2021-01-122021-04-30山东大学Experience playback sampling reinforcement learning method and system based on confidence upper bound thought
CN113162850A (en)*2021-01-132021-07-23中国科学院计算技术研究所Artificial intelligence-based heterogeneous network multi-path scheduling method and system
CN112836974A (en)*2021-02-052021-05-25上海海事大学 A Dynamic Scheduling Method for Multi-Field Bridges in Box Intervals Based on DQN and MCTS
CN113268933A (en)*2021-06-182021-08-17大连理工大学Rapid structural parameter design method of S-shaped emergency robot based on reinforcement learning
US20230072777A1 (en)*2021-07-162023-03-09Tata Consultancy Services LimitedBudget constrained deep q-network for dynamic campaign allocation in computational advertising
US11915262B2 (en)*2021-07-162024-02-27Tata Consultancy Services LimitedBudget constrained deep Q-network for dynamic campaign allocation in computational advertising
CN113923308A (en)*2021-10-152022-01-11浙江工业大学 Predictive outbound call task assignment method and outbound call system based on deep reinforcement learning
CN114138416A (en)*2021-12-032022-03-04福州大学 Load-time window-oriented adaptive allocation method of cloud software resources based on DQN
CN114161419A (en)*2021-12-132022-03-11大连理工大学Robot operation skill efficient learning method guided by scene memory
GB2627895A (en)*2021-12-152024-09-04IbmReinforcement learning under constraints
TWI822291B (en)*2021-12-152023-11-11美商萬國商業機器公司Computer-implemented methods, computer program products, and computer processing systems for offline reinforcement learning with a dataset
WO2023111700A1 (en)*2021-12-152023-06-22International Business Machines CorporationReinforcement learning under constraints
US12443678B2 (en)2021-12-152025-10-14International Business Machines CorporationStepwise uncertainty-aware offline reinforcement learning under constraints
CN114331754A (en)*2021-12-232022-04-12重庆大学 A cloud manufacturing service composition method based on multi-strategy deep reinforcement learning
CN118395829A (en)*2024-03-082024-07-26南京邮电大学 A two-level decision-making guidance method for electric vehicles in traffic electrification coupled systems

Also Published As

Publication numberPublication date
US20200065672A1 (en)2020-02-27

Similar Documents

PublicationPublication DateTitle
US20170032245A1 (en)Systems and Methods for Providing Reinforcement Learning in a Deep Learning System
WO2017004626A1 (en)Systems and methods for providing reinforcement learning in a deep learning system
Osband et al.Deep exploration via bootstrapped DQN
US12053704B2 (en)Artificial intelligence (AI) model training to generate an AI model personalized to a user
Shao et al.A survey of deep reinforcement learning in video games
AU2016354558B2 (en)Asynchronous deep reinforcement learning
US11157316B1 (en)Determining action selection policies of an execution device
Wu et al.Deep ensemble reinforcement learning with multiple deep deterministic policy gradient algorithm
Dowe et al.Bayes not Bust! Why Simplicity is no Problem for Bayesians1
Xin et al.Exploration entropy for reinforcement learning
US20210311777A1 (en)Determining action selection policies of an execution device
US9981190B2 (en)Telemetry based interactive content generation
CN112292699A (en)Determining action selection guidelines for an execution device
Chen et al.Balancing exploration and exploitation in episodic reinforcement learning
Yoon et al.Monte carlo tree diffusion for system 2 planning
Freire et al.Sequential memory improves sample and memory efficiency in episodic control
Amhraoui et al.Expected Lenient Q-learning: a fast variant of the Lenient Q-learning algorithm for cooperative stochastic Markov games
Sun et al.A unified framework for factorizing distributional value functions for multi-agent reinforcement learning
Bossens et al.Lifetime policy reuse and the importance of task capacity
Donâncio et al.Dynamic learning rate for deep reinforcement learning: a bandit approach
Yu et al.An improved artificial bee colony algorithm based on factor library and dynamic search balance
Olesen et al.Evolutionary planning in latent space
Raihen et al.Optimizing reinforcement learning in complex environments using neural networks
US20210138350A1 (en)Sensor statistics for ranking users in matchmaking systems
Naik et al.E xploration E xploitation Problem in Policy Based Deep Reinforcement Learning for Episodic and Continuous Environments

Legal Events

DateCodeTitleDescription
STPPInformation on status: patent application and granting procedure in general

Free format text:NON FINAL ACTION MAILED

STCBInformation on status: application discontinuation

Free format text:ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

ASAssignment

Owner name:THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITY, CALIFORNIA

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:OSBAND, IAN DAVID MOFFAT;VAN ROY, BENJAMIN;REEL/FRAME:052368/0934

Effective date:20191217


[8]ページ先頭

©2009-2025 Movatter.jp