Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Nature Machine Intelligence
  • Article
  • Published:

Continual learning of context-dependent processing in neural networks

Nature Machine Intelligencevolume 1pages364–372 (2019)Cite this article

Subjects

Apreprint version of the article is available at arXiv.

Abstract

Deep neural networks are powerful tools in learning sophisticated but fixed mapping rules between inputs and outputs, thereby limiting their application in more complex and dynamic situations in which the mapping rules are not kept the same but change according to different contexts. To lift such limits, we developed an approach involving a learning algorithm, called orthogonal weights modification, with the addition of a context-dependent processing module. We demonstrated that with orthogonal weights modification to overcome catastrophic forgetting, and the context-dependent processing module to learn how to reuse a feature representation and a classifier for different contexts, a single network could acquire numerous context-dependent mapping rules in an online and continual manner, with as few as approximately ten samples to learn each. Our approach should enable highly compact systems to gradually learn myriad regularities of the real world and eventually behave appropriately within it.

This is a preview of subscription content,access via your institution

Access options

Access through your institution

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

9,800 Yen / 30 days

cancel any time

Subscribe to this journal

Receive 12 digital issues and online access to articles

¥14,900 per year

only ¥1,242 per issue

Buy this article

  • Purchase on SpringerLink
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Schematic of OWM.
Fig. 2: Performance of OWM, CAB and SGD in the ten-task disjoint MNIST experiment.
Fig. 3: Continual learning with small sample size achieved by OWM in recognizing Chinese characters.
Fig. 4: Achieving context-dependent sequential learning via the OWM algorithm and the CDP module.

Similar content being viewed by others

ArticleOpen access21 August 2024

ArticleOpen access05 December 2022

Data availability

All data used in this paper are publicly available and can be accessed athttp://yann.lecun.com/exdb/mnist/ for the MNIST dataset,https://www.cs.toronto.edu/~kriz/cifar.html for the CIFAR dataset,http://image-net.org/index for the ILSVR2012 dataset,http://www.nlpr.ia.ac.cn/databases/handwriting/Home.html for the CASIA-HWDB dataset andhttp://mmlab.ie.cuhk.edu.hk/projects/CelebA.html for the CelebA dataset. For more details of the datasets, please refer to the references cited in theSupplementary Methods.

Code availability

The source code can be accessed athttps://github.com/beijixiong3510/OWM56.

References

  1. Newell, A.Unified Theories of Cognition (Harvard Univ. Press, 1994).

  2. Miller, G. A., Heise, G. A. & Lichten, W. The intelligibility of speech as a function of the context of the test materials.J. Exp. Psychol.41, 329–335 (1951).

    Article  Google Scholar 

  3. Desimone, R. & Duncan, J. Neural mechanisms of selective visual-attention.Annu. Rev. Neurosci.18, 193–222 (1995).

    Article  Google Scholar 

  4. Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex.Nature503, 78–84 (2013).

    Article  Google Scholar 

  5. Siegel, M., Buschman, T. J. & Miller, E. K. Cortical information flow during flexible sensorimotor decisions.Science348, 1352–1355 (2015).

    Article  Google Scholar 

  6. Miller, E. K. The prefrontal cortex: complex neural properties for complex behavior.Neuron22, 15–17 (1999).

    Article  Google Scholar 

  7. Wise, S. P., Murray, E. A. & Gerfen, C. R. The frontal cortex basal ganglia system in primates.Crit. Rev. Neurobiol.10, 317–356 (1996).

    Article  Google Scholar 

  8. Passingham, R.The Frontal Lobes and Voluntary Action (Oxford Univ. Press, 1993).

  9. Miller, E. K. & Cohen, J. D. An integrative theory of prefrontal cortex function.Annu. Rev. Neurosci.24, 167–202 (2001).

    Article  Google Scholar 

  10. Miller, E. K. The prefontral cortex and cognitive control.Nat. Rev. Neurosci.1, 59–65 (2000).

    Article  Google Scholar 

  11. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning.Nature521, 436–444 (2015).

    Article  Google Scholar 

  12. McCloskey, M. & Cohen, N. J.Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem Vol. 24 109–165 (Elsevier, 1989).

  13. Ratcliff, R. Connectionist models of recognition memory—constraints imposed by learning and forgetting functions.Psychol. Rev.97, 285–308 (1990).

    Article  Google Scholar 

  14. Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A. & Bengio, Y. An empirical investigation of catastrophic forgetting in gradient-based neural networks. Preprint athttps://arxiv.org/abs/1312.6211 (2013).

  15. Parisi, G. I., Kemker, R., Part, J. L., Kanan, C. & Wermter, S. Continual lifelong learning with neural networks: a review.Neural Netw.113, 54–71 (2019).

    Article  Google Scholar 

  16. Haykin, S. S.Adaptive Filter theory (Pearson Education India, 2008).

  17. Golub, G. H. & Van Loan, C. F.Matrix Computations Vol. 3 (JHU Press, 2012).

  18. Singhal, S. & Wu, L. Training feed-forward networks with the extended kalman algorithm. InInternational Conference on Acoustics, Speech, and Signal Processing 1187–1190 (IEEE, 1989).

  19. Shah, S., Palmieri, F. & Datum, M. Optimal filtering algorithms for fast learning in feedforward neural networks.Neural Netw.5, 779–787 (1992).

    Article  Google Scholar 

  20. Sussillo, D. & Abbott, L. F. Generating coherent patterns of activity from chaotic neural networks.Neuron63, 544–557 (2009).

    Article  Google Scholar 

  21. Jaeger, H. Controlling recurrent neural networks by conceptors. Preprint athttps://arxiv.org/abs/1403.3369 (2014).

  22. He, X. & Jaeger, H. Overcoming catastrophic interference using conceptor-aided backpropagation. InInternational Conference on Learning Representations (ICLR, 2018).

  23. Nair, V. & Hinton, G. E. Rectified linear units improve restricted Boltzmann machines. InInternational Conference on Machine Learning 807–814 (PMLR, 2010).

  24. Kirkpatricka, J. et al. Overcoming catastrophic forgetting in neural networks.Proc. Natl Acad. Sci. USA114, 3521–3526 (2017).

    Article MathSciNet  Google Scholar 

  25. Lee, S.-W., Kim, J.-H., Jun, J., Ha, J.-W. & Zhang, B.-T. Overcoming catastrophic forgetting by incremental moment matching. InAdvances in Neural Information Processing Systems 4652–4662 (Curran Associates, 2017).

  26. Zenke, F., Poole, B. & Ganguli, S. Continual learning through synaptic intelligence. InInternational Conference on Machine Learning 6072–6082 (PMLR, 2017).

  27. Liu, C.-L., Yin, F., Wang, D.-H. & Wang, Q.-F. Chinese handwriting recognition contest 2010. InChinese Conference onPattern Recognition (CCPR) 1–5 (IEEE, 2010).

  28. Yin, F., Wang, Q.-F., Zhang, X.-Y. & Liu, C.-L. ICDAR 2013 Chinese handwriting recognition competition. In12th International Conference on Document Analysis and Recognition (ICDAR) 1464–1470 (IEEE, 2013).

  29. Fuster, J.The Prefrontal Cortex (Academic Press, 2015).

  30. Liu, Z., Luo, P., Wang, X. & Tang, X. Deep learning face attributes in the wild. InIEEE International Conference on Computer Vision 3730–3738 (IEEE, 2015).

  31. Řehůřek, R. & Sojka, P. Software framework for topic modelling with large corpora.Proc. LREC 2010 Workshop on New Challenges for NLP Frameworks 45–50 (ELRA, 2010).

  32. Lehky, S. R., Kiani, R., Esteky, H. & Tanaka, K. Dimensionality of object representations in monkey inferotemporal cortex.Neural Comput.26, 2135–2162 (2014).

    Article  Google Scholar 

  33. Freedman, D. J., Riesenhuber, M., Poggio, T. & Miller, E. K. Categorical representation of visual stimuli in the primate prefrontal cortex.Science291, 312–316 (2001).

    Article  Google Scholar 

  34. Hung, C. P., Kreiman, G., Poggio, T. & DiCarlo, J. J. Fast readout of object identity from macaque inferior temporal cortex.Science310, 863–866 (2005).

    Article  Google Scholar 

  35. Kravitz, D. J., Saleem, K. S., Baker, C. I., Ungerleider, L. G. & Mishkin, M. The ventral visual pathway: an expanded neural framework for the processing of object quality.Trends Cogn. Sci.17, 26–49 (2013).

    Article  Google Scholar 

  36. Gomez, J. et al. Microstructural proliferation in human cortex is coupled with the development of face processing.Science355, 68–71 (2017).

    Article  Google Scholar 

  37. Xu, F. & Tenenbaum, J. B. Word learning as Bayesian inference.Psychol. Rev.114, 245–272 (2007).

    Article  Google Scholar 

  38. Rigotti, M. et al. The importance of mixed selectivity in complex cognitive tasks.Nature497, 585–590 (2013).

    Article  Google Scholar 

  39. Cichon, J. & Gan, W.-B. Branch-specific dendritic Ca2+ spikes cause persistent synaptic plasticity.Nature520, 180–185 (2015).

    Article  Google Scholar 

  40. Rusu, A. A. et al. Progressive neural networks. Preprint athttps://arxiv.org/abs/1606.04671 (2016).

  41. Masse, N. Y., Grant, G. D. & Freedman, D. J. Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization.Proc. Natl Acad. Sci. USA115, E10467–E10475 (2018).

    Article  Google Scholar 

  42. McClelland, J. L., McNaughton, B. L. & Oreilly, R. C. Why there are complementary learning-systems in the hippocampus and neocortex—insights from the successes and failures of connectionist models of learning and memory.Psychol. Rev.102, 419–457 (1995).

    Article  Google Scholar 

  43. Kumaran, D., Hassabis, D. & McClelland, J. L. What learning systems do intelligent agents need? Complementary learning systems theory updated.Trends Cogn. Sci.20, 512–534 (2016).

    Article  Google Scholar 

  44. Shin, H., Lee, J. K., Kim, J. & Kim, J. Continual learning with deep generative replay. InAdvances in Neural Information Processing Systems 2990–2999 (Curran Associates, 2017).

  45. Li, Z. & Hoiem, D. Learning without forgetting.IEEE Trans. Pattern Anal. Mach. Intell.40, 2935–2947 (2017).

    Article  Google Scholar 

  46. Rohrbach, M., Stark, M., Szarvas, G., Gurevych, I. & Schiele, B. What helps where—and why? Semantic relatedness for knowledge transfer. InIEEE Computer Society Conference on Computer Vision and Pattern Recognition 910–917 (IEEE, 2010).

  47. Yosinski, J., Clune, J., Bengio, Y. & Lipson, H. How transferable are features in deep neural networks? InAdvances in Neural Information Processing Systems 3320–3328 (Curran Associates, 2014).

  48. Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. Preprint athttps://arxiv.org/abs/1503.02531 (2015).

  49. Schwarz, J. et al. Progress & compress: a scalable framework for continual learning. Preprint athttps://arxiv.org/abs/1805.06370 (2018).

  50. Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. InProc. Thirteenth International Conference on Artificial Intelligence and Statistics 249–256 (Microtome, 2010).

  51. Nair, V. & Hinton, G. E. Rectified linear units improve restricted boltzmann machines. InProc. 27th International Conference on Machine Learning (ICML-10) 807–814 (PMLR, 2010).

  52. Srivastava, R. K., Masci, J., Kazerounian, S., Gomez, F. & Schmidhuber, J. Compete to compute. InAdvances in Neural Information Processing Systems 2310–2318 (Curran Associates, 2013).

  53. He, K. M., Zhang, X. Y., Ren, S. Q. & Sun, J. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).

  54. He, K., Zhang, X., Ren, S. & Sun, J. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. InIEEE International Conference on Computer Vision 1026–1034 (IEEE, 2015).

  55. Ramirez-Cardenas, A. & Viswanathan, P. The role of prefrontal mixed selectivity in cognitive control.J. Neurosci.36, 9013–9015 (2016).

    Article  Google Scholar 

  56. Zeng, G., Chen, Y., Cui, B. & Yu, S. Codes for paper Continual learning of context-dependent processing in neural networks.Zenodohttps://doi.org/10.5281/zenodo.3346080 (2019).

  57. Hu, W. et al. Overcoming catastrophic forgetting via model adaptation. InInternational Conference on Learning Representations (ICLR, 2019).

Download references

Acknowledgements

The authors thank D. Nikolić for helpful discussions and R. Hadsell for comments on the manuscript. This work was supported by the National Key Research and Development Program of China (2017YFA0105203), the Strategic Priority Research Program of the Chinese Academy of Sciences (CAS) (XDB32040200), Key Research Program of the National Laboratory of Pattern Recognition (99S9011M2N), and the Hundred-Talent Program of CAS (for S.Y.).

Author information

Author notes
  1. These authors contributed equally: Guanxiong Zeng, Yang Chen.

Authors and Affiliations

  1. Brainnetome Center and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China

    Guanxiong Zeng, Yang Chen, Bo Cui & Shan Yu

  2. University of Chinese Academy of Sciences, Beijing, China

    Guanxiong Zeng, Bo Cui & Shan Yu

  3. Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, China

    Shan Yu

Authors
  1. Guanxiong Zeng

    You can also search for this author inPubMed Google Scholar

  2. Yang Chen

    You can also search for this author inPubMed Google Scholar

  3. Bo Cui

    You can also search for this author inPubMed Google Scholar

  4. Shan Yu

    You can also search for this author inPubMed Google Scholar

Contributions

S.Y., Y.C. and G.Z conceived the study and designed the experiments. G.Z. and Y.C. conducted computational experiments and theoretical analyses. B.C. assisted with some experiments and analyses. S.Y., Y.C. and G.Z. wrote the paper.

Corresponding author

Correspondence toShan Yu.

Ethics declarations

Competing interests

The Institute of Automation, Chinese Academy of Sciences has submitted patent applications on the OWM algorithm (application no. PCT/CN2019/083355; invented by Y.C., G.Z. and S.Y.; pending) and the CDP module (application no. PCT/CN2019/083356; invented by G.Z., Y.C. and S.Y.; pending).

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary discussion, methods, Figs. 1–7, Tables 1–7 and references.

Rights and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeng, G., Chen, Y., Cui, B.et al. Continual learning of context-dependent processing in neural networks.Nat Mach Intell1, 364–372 (2019). https://doi.org/10.1038/s42256-019-0080-x

Download citation

Access through your institution
Buy or subscribe

Advertisement

Search

Advanced search

Quick links

Nature Briefing AI and Robotics

Sign up for theNature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox.Sign up for Nature Briefing: AI and Robotics

[8]ページ先頭

©2009-2025 Movatter.jp