Movatterモバイル変換


[0]ホーム

URL:


US20200250483A1 - Training Artificial Neural Networks Using Context-Dependent Gating with Weight Stabilization - Google Patents

Training Artificial Neural Networks Using Context-Dependent Gating with Weight Stabilization
Download PDF

Info

Publication number
US20200250483A1
US20200250483A1US16/774,343US202016774343AUS2020250483A1US 20200250483 A1US20200250483 A1US 20200250483A1US 202016774343 AUS202016774343 AUS 202016774343AUS 2020250483 A1US2020250483 A1US 2020250483A1
Authority
US
United States
Prior art keywords
ann
task
neurons
training
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US16/774,343
Other versions
US11205097B2 (en
Inventor
Nicolas Y. Masse
Gregory D. Grant
David J. Freedman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Chicago
Original Assignee
University of Chicago
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of ChicagofiledCriticalUniversity of Chicago
Priority to US16/774,343priorityCriticalpatent/US11205097B2/en
Assigned to THE UNIVERSITY OF CHICAGOreassignmentTHE UNIVERSITY OF CHICAGOASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: FREEDMAN, DAVID, GRANT, Gregory, MASSE, NICOLAS
Publication of US20200250483A1publicationCriticalpatent/US20200250483A1/en
Priority to US17/524,338prioritypatent/US20220067442A1/en
Application grantedgrantedCritical
Publication of US11205097B2publicationCriticalpatent/US11205097B2/en
Activelegal-statusCriticalCurrent
Adjusted expirationlegal-statusCritical

Links

Images

Classifications

Definitions

Landscapes

Abstract

A computing device may receive a first set of training data for training an ANN to predict output data for a first task, and may train the ANN with the first set of training data by only adjusting values of weights associated with a first subset of neurons, the first subset selected based on an identity of the first task. The computing device may receive a second, different set of training data for training the ANN to predict output data for a second task, and may train the ANN with the second set of training data by only adjusting values of weights associated with a second subset of neurons, the second subset selected based on an identity of the second task. During training, adjusting of the value of any weight may entail weight stabilization that depends on whether there has been any training for one or more previous tasks.

Description

Claims (20)

What is claimed is:
1. A computer-implemented method, carried out by a computing device, for computationally training an artificial neural network (ANN) implemented in the computing device, the method comprising:
at the computing device, receiving a first set of training data for training the ANN to predict output data for a first type of task;
at the computing device, training the ANN with the first set of training data by adjusting values of only those weights associated with a first subset of neurons of the ANN, wherein the first subset of neurons is selected based on an identity of the first type of task;
at the computing device, receiving a second set of training data for training the ANN to predict output data for a second type of task, wherein the second type of task is different from the first type of task; and
at the computing device, training the ANN with the second set of training data by adjusting values of only those weights associated with a second subset of neurons of the ANN, wherein the second subset of neurons is selected based on an identity of the second type of task;
wherein, during training of the ANN for any given type of task, adjusting of the value of any given weight associated with neurons of the ANN comprises:
if the ANN has been previously trained for one or more task types different from the given type, computationally biasing adjustment of the value of the given weight according to a respective importance of the given weight to a predictive capability of the ANN for the one or more task types,
and if the ANN has not been previously trained for any task types different from the given type, computationally adjusting the value of the given weight without bias.
2. The computer-implemented method ofclaim 1, wherein selecting the first subset of neurons based on the identity of the first type of task comprises consulting stored information that associates the first subset of neurons with the identity of the first type of task,
and wherein selecting the second subset of neurons based on the identity of the second type of task comprises consulting stored information that associates the second subset of neurons with the identity of the second type of task.
3. The computer-implemented method ofclaim 1, wherein the ANN comprises an input layer, an output layer, and one or more intermediate hidden layers,
wherein each neuron of the ANN resides in one of the layers of the ANN,
and wherein selecting either one of the first subset of neurons or the second subset of neurons comprises applying a gating table to the one or more intermediate hidden layers to pick out neurons according to either one of the first or second types of tasks,
wherein the gating table correlates neurons of the ANN with types of tasks,
and wherein each entry in the gating table is a binary assignment of whether a neuron associated with the entry should be either active or gated for a particular type of task during training of the ANN.
4. The computer-implemented method ofclaim 3, wherein the binary assignment of whether a neuron should be either active or gated for the particular type of task during training of the ANN is based in part on a predefined optimal percentage of neurons to gate in the ANN for the particular type of task.
5. The computer-implemented method ofclaim 4, wherein the predefined optimal percentage is determined according to at least one of: a size of the ANN, a number of layers in the ANN, or a number of task types upon which the ANN is trained.
6. The computer-implemented method ofclaim 3, further comprising determining the gating table prior to training the ANN for any task types, wherein determining the gating table comprises: for each type of task, randomly selecting neurons for gating.
7. The computer-implemented method ofclaim 3, wherein the gating table is a two-dimensional table,
and wherein: (i) each row of the gating table corresponds to a different one of multiple types of tasks, including the first and second types of tasks, and (ii) each column of the table corresponds to a different neuron from among the intermediate hidden layers of the ANN.
8. The computer-implemented method ofclaim 3, wherein the gating table is a three-dimensional table comprising a collection of like-sized two-dimensional gating matrices stacked in a third dimension that corresponds to types of tasks, including the first and second types of tasks,
and wherein, for each of the two-dimensional gating matrices: (i) each column corresponds to a different one of the one or more intermediate hidden layers of the ANN, and (ii) each row corresponds to a different neuron position within the intermediate hidden layers.
9. The computer-implemented method ofclaim 1, wherein adjusting the values of only those weights associated with the first subset of neurons of the ANN comprises gating all neurons of the ANN during training for the first type of task except those of the first subset,
and wherein adjusting the values of only those weights associated with the second subset of neurons of the ANN comprises gating all neurons of the ANN during training for the second type of task except those of the second subset,
wherein gating any given neuron during training comprises computationally suppressing adjustment of weights associated with the given neuron during training.
10. The computer-implemented method ofclaim 9, wherein computationally suppressing adjustment of the weights associated with the given neuron during training comprises at least one of: multiplying one or more inputs of the given neuron by zero, or multiplying one or more outputs of the given neuron by zero.
11. The computer-implemented method ofclaim 1, wherein computationally biasing adjustment of the value of the given weight according to the respective importance of the given weight to the predictive capability of the ANN for the one or more task types comprises applying a penalty that computationally inhibits changing the value, the penalty increasing with increasing respective importance of the given weight to the predictive capability of the ANN for the one or more task types,
and wherein computationally adjusting the value of the given weight without bias comprises adjusting the value without applying any computational penalty.
12. The computer-implemented method ofclaim 11, wherein applying the penalty that computationally inhibits changing the value comprises applying synaptic stabilization to the ANN during training.
13. The computer-implemented method ofclaim 1, further comprising:
subsequent to training the ANN with both the first set of training data and the second set of training data:
receiving runtime data associated with the first type of task;
applying the ANN to the runtime data associated with the first type of task to predict runtime output data for the first type of task, wherein only the first subset of neurons of the ANN are activated when applying the ANN to the runtime data associated with the first type of task;
receiving runtime data associated with the second type of task; and
applying the ANN to the runtime data associated with the second type of task to predict runtime output data for the second type of task, wherein only the second subset of neurons of the ANN are activated when applying the ANN to the runtime data associated with the second type of task.
14. The computer-implemented method ofclaim 1, wherein subsequent to training the ANN with both the first set of training data and the second set of training data, the predictive capability of the ANN for the first type of task is higher than that of an alternatively-trained ANN trained, wherein alternative training comprises training for both the first type task and the second type of task without selecting either first or second subsets of neurons, and without biasing adjustment of any neurons.
15. A computing device comprising:
one or more processors; and
memory configured to store computer-executable instructions that, when executed by the one or more processors, cause the computing device to carry out operations including:
receiving a first set of training data for training an artificial neural network (ANN) implemented on the one or more computing devices to predict output data for a first type of task;
training the ANN with the first set of training data by adjusting values of only those weights associated with a first subset of neurons of the ANN, wherein the first subset of neurons is selected based on an identity of the first type of task;
receiving a second set of training data for training the ANN to predict output data for a second type of task, wherein the second type of task is different from the first type of task; and
training the ANN with the second set of training data by adjusting values of only those weights associated with a second subset of neurons of the ANN, wherein the second subset of neurons is selected based on an identity of the second type of task;
wherein, during training of the ANN for any given type of task, adjusting of the value of any given weight associated with neurons of the ANN comprises:
if the ANN has been previously trained for one or more task types different from the given type, computationally biasing adjustment of the value of the given weight according to a respective importance of the given weight to a predictive capability of the ANN for the one or more task types,
and if the ANN has not been previously trained for any task types different from the given type, computationally adjusting the value of the given weight without bias.
16. The computing device ofclaim 15, wherein selecting the first subset of neurons based on the identity of the first type of task comprises consulting stored information that associates the first subset of neurons with the identity of the first type of task,
and wherein selecting the second subset of neurons based on the identity of the second type of task comprises consulting stored information that associates the second set of neurons with the identity of the second type of task.
17. The computing device ofclaim 15, wherein the ANN comprises an input layer, an output layer, and one or more intermediate hidden layers,
wherein each neuron of the ANN resides in one of the layers of the ANN,
and wherein selecting either one of the first subset of neurons or the second subset of neurons comprises applying a gating table to the one or more intermediate hidden layers to pick out neurons according to either one of the first or second types of tasks,
wherein the gating table correlates neurons of the ANN with types of tasks,
and wherein each entry in the gating table is a binary assignment of whether a neuron associated with the entry should be either active or gated for a particular type of task during training of the ANN.
18. The computing device ofclaim 17, wherein the binary assignment of whether a neuron should be either active or gated for the particular type of task during training of the ANN is based in part on a predefined optimal percentage of neurons to gate in the ANN for the particular type of task.
19. The computing device ofclaim 15, wherein adjusting the values of only those weights associated with the first subset of neurons of the ANN comprises gating all neurons of the ANN during training for the first type of task except those of the first subset,
and wherein adjusting the values of only those weights associated with the second subset of neurons of the ANN comprises gating all neurons of the ANN during training for the second type of task except those of the second subset,
wherein gating any given neuron during training comprises computationally suppressing adjustment of weights associated with the given neuron during training.
20. An article of manufacture comprising non-transitory computer readable media having computer-readable instructions stored thereon that, when executed by one or more processors of a computing device, cause the computing device to carry out operations including:
receiving a first set of training data for training an artificial neural network (ANN) implemented on the one or more computing devices to predict output data for a first type of task;
training the ANN with the first set of training data by adjusting values of only those weights associated with a first subset of neurons of the ANN, wherein the first subset of neurons is selected based on an identity of the first type of task;
receiving a second set of training data for training the ANN to predict output data for a second type of task, wherein the second type of task is different from the first type of task; and
training the ANN with the second set of training data by adjusting values of only those weights associated with a second subset of neurons of the ANN, wherein the second subset of neurons is selected based on an identity of the second type of task;
wherein, during training of the ANN for any given type of task, adjusting of the value of any given weight associated with neurons of the ANN comprises:
if the ANN has been previously trained for one or more task types different from the given type, computationally biasing adjustment of the value of the given weight according to a respective importance of the given weight to a predictive capability of the ANN for the one or more task types,
and if the ANN has not been previously trained for any task types different from the given type, computationally adjusting the value of the given weight without bias.
US16/774,3432019-02-012020-01-28Training artificial neural networks using context-dependent gating with weight stabilizationActive2040-06-13US11205097B2 (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
US16/774,343US11205097B2 (en)2019-02-012020-01-28Training artificial neural networks using context-dependent gating with weight stabilization
US17/524,338US20220067442A1 (en)2019-02-012021-11-11Training Artificial Neural Networks Using Context-Dependent Gating with Weight Stabilization

Applications Claiming Priority (2)

Application NumberPriority DateFiling DateTitle
US201962800167P2019-02-012019-02-01
US16/774,343US11205097B2 (en)2019-02-012020-01-28Training artificial neural networks using context-dependent gating with weight stabilization

Related Child Applications (1)

Application NumberTitlePriority DateFiling Date
US17/524,338ContinuationUS20220067442A1 (en)2019-02-012021-11-11Training Artificial Neural Networks Using Context-Dependent Gating with Weight Stabilization

Publications (2)

Publication NumberPublication Date
US20200250483A1true US20200250483A1 (en)2020-08-06
US11205097B2 US11205097B2 (en)2021-12-21

Family

ID=71836054

Family Applications (2)

Application NumberTitlePriority DateFiling Date
US16/774,343Active2040-06-13US11205097B2 (en)2019-02-012020-01-28Training artificial neural networks using context-dependent gating with weight stabilization
US17/524,338PendingUS20220067442A1 (en)2019-02-012021-11-11Training Artificial Neural Networks Using Context-Dependent Gating with Weight Stabilization

Family Applications After (1)

Application NumberTitlePriority DateFiling Date
US17/524,338PendingUS20220067442A1 (en)2019-02-012021-11-11Training Artificial Neural Networks Using Context-Dependent Gating with Weight Stabilization

Country Status (1)

CountryLink
US (2)US11205097B2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20210150345A1 (en)*2019-11-142021-05-20Qualcomm IncorporatedConditional Computation For Continual Learning
US20220164654A1 (en)*2020-11-262022-05-26Robert Bosch GmbhEnergy- and memory-efficient training of neural networks
US11379991B2 (en)*2020-05-292022-07-05National Technology & Engineering Solutions Of Sandia, LlcUncertainty-refined image segmentation under domain shift
US20220301291A1 (en)*2020-05-292022-09-22National Technology & Engineering Solutions Of Sandia, LlcUncertainty-refined image segmentation under domain shift
US11551075B2 (en)*2019-03-282023-01-10Cirrus Logic, Inc.Artificial neural networks
US20230115113A1 (en)*2021-10-042023-04-13Royal Bank Of CanadaSystem and method for machine learning architecture for multi-task learning with dynamic neural networks
US20230177332A1 (en)*2021-12-062023-06-08Samsung Electronics Co., Ltd.System and method for continual refinable network

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US11455531B2 (en)*2019-10-152022-09-27Siemens AktiengesellschaftTrustworthy predictions using deep neural networks based on adversarial calibration
US12217139B2 (en)2019-10-152025-02-04Siemens AktiengesellschaftTransforming a trained artificial intelligence model into a trustworthy artificial intelligence model
US11741371B2 (en)*2020-03-202023-08-29International Business Machines CorporationAutomatically generating diverse text

Cited By (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US11551075B2 (en)*2019-03-282023-01-10Cirrus Logic, Inc.Artificial neural networks
US11803742B2 (en)2019-03-282023-10-31Cirrus Logic Inc.Artificial neural networks
US20210150345A1 (en)*2019-11-142021-05-20Qualcomm IncorporatedConditional Computation For Continual Learning
US12271800B2 (en)*2019-11-142025-04-08Qualcomm IncorporatedConditional computation for continual learning
US11379991B2 (en)*2020-05-292022-07-05National Technology & Engineering Solutions Of Sandia, LlcUncertainty-refined image segmentation under domain shift
US20220301291A1 (en)*2020-05-292022-09-22National Technology & Engineering Solutions Of Sandia, LlcUncertainty-refined image segmentation under domain shift
US12169962B2 (en)*2020-05-292024-12-17National Technology & Engineering Solutions Of Sandia, LlcUncertainty-refined image segmentation under domain shift
US20220164654A1 (en)*2020-11-262022-05-26Robert Bosch GmbhEnergy- and memory-efficient training of neural networks
US20230115113A1 (en)*2021-10-042023-04-13Royal Bank Of CanadaSystem and method for machine learning architecture for multi-task learning with dynamic neural networks
US20230177332A1 (en)*2021-12-062023-06-08Samsung Electronics Co., Ltd.System and method for continual refinable network

Also Published As

Publication numberPublication date
US11205097B2 (en)2021-12-21
US20220067442A1 (en)2022-03-03

Similar Documents

PublicationPublication DateTitle
US11205097B2 (en)Training artificial neural networks using context-dependent gating with weight stabilization
Tyulmankov et al.Meta-learning synaptic plasticity and memory addressing for continual familiarity detection
US10380479B2 (en)Acceleration of convolutional neural network training using stochastic perforation
US12169782B2 (en)Dynamic precision scaling at epoch granularity in neural networks
KR102760554B1 (en)Training a Student Neural Network to Mimic a Mentor Neural Network With Inputs That Maximize Student-to-Mentor Disagreement
US11914672B2 (en)Method of neural architecture search using continuous action reinforcement learning
US20210158156A1 (en)Distilling from Ensembles to Improve Reproducibility of Neural Networks
KR102063377B1 (en)Incremental Training Based Knowledge Transfer Method for Training Large Deep Neural Networks and Apparatus Therefor
Cossu et al.Continual learning with gated incremental memories for sequential data processing
US11003960B2 (en)Efficient incident management in large scale computer systems
KR20220097767A (en)Apparatus for generating signature that reflects the similarity of the malware detection classification system based on deep neural networks, method therefor, and computer recordable medium storing program to perform the method
WO2022068934A1 (en)Method of neural architecture search using continuous action reinforcement learning
US20160239736A1 (en)Method for dynamically updating classifier complexity
KR102311659B1 (en)Apparatus for computing based on convolutional neural network model and method for operating the same
JP2023046213A (en) METHOD, INFORMATION PROCESSING DEVICE, AND PROGRAM FOR TRANSFER LEARNING WHILE SUPPRESSING CATASTIC FORGETTING
Wang et al.A novel restricted Boltzmann machine training algorithm with fast Gibbs sampling policy
US12288139B2 (en)Iterative machine learning and relearning
KR20230038136A (en) Knowledge distillation method and system specialized for lightweight pruning-based deep neural networks
US20220180167A1 (en)Memory-augmented neural network system
CN111133451A (en) Temporal Pooling and Correlation in Artificial Neural Networks
Kim et al.Tweaking deep neural networks
Tyulmankov et al.Meta-learning local synaptic plasticity for continual familiarity detection
Zhu et al.FGGP: Fixed-Rate Gradient-First Gradual Pruning
WO2024062673A1 (en)Machine learning device, machine learning method, and machine learning program
KR20240025798A (en)Method for tunning pre-trained neural network model

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:THE UNIVERSITY OF CHICAGO, ILLINOIS

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FREEDMAN, DAVID;GRANT, GREGORY;MASSE, NICOLAS;SIGNING DATES FROM 20200106 TO 20200107;REEL/FRAME:051641/0980

FEPPFee payment procedure

Free format text:ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Free format text:ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STPPInformation on status: patent application and granting procedure in general

Free format text:APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED

STPPInformation on status: patent application and granting procedure in general

Free format text:DOCKETED NEW CASE - READY FOR EXAMINATION

STPPInformation on status: patent application and granting procedure in general

Free format text:NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPPInformation on status: patent application and granting procedure in general

Free format text:PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCFInformation on status: patent grant

Free format text:PATENTED CASE

MAFPMaintenance fee payment

Free format text:PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment:4


[8]ページ先頭

©2009-2025 Movatter.jp