Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations

Yiping Lu, Aoxiao Zhong, Quanzheng Li, Bin Dong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Scopus citations

Abstract

Deep neural networks have become the state- of-the-art models in numerous machine learning tasks. However, general guidance to network architecture design is still missing. In our work, we bridge deep neural network design with numerical differential equations. We show that many effective networks, such as ResNet, PolyNet, Frac- TalNet and RevNet, can be interpreted as different numerical discretizations of differential equations. This finding brings us a brand new per-spective on the design of effective deep architectures. We can take advantage of the rich knowl-edge in numerical analysis to guide us in designing new and potentially more effective deep networks. As an example, we propose a linear multi-step architecture (LM-architecture) which is inspired by the linear multi-step method solving ordinary differential equations. The LM- architecture is an effective structure that can be used on any ResNet-like networks. In particular, we demonstrate that LM-ResNet and LM- ResNeXt (i.e. the networks obtained by applying the LM-architecture on ResNet and ResNeXt respectively) can achieve noticeably higher accuracy than ResNet and ResNeXt on both CI- FAR and ImageNet with comparable numbers of trainable parameters. In particular, on both CI- FAR and ImageNet, LM-ResNet/LM-ResNeXt can significantly compress the original networks while maintaining a similar performance. This can be explained mathematically using the concept of modified equation from numerical analysis. Last but not least, we also establish a connection between stochastic control and noise injection in the training process which helps to improve generalization of the networks. Furthermore, by relating stochastic training strategy with stochastic dynamic system, we can easily apply stochastic training to the networks with the LM-architecture. As an example, we introduced stochastic depth to LM-ResNet and achieve significant improvement over the original LM-ResNet on CIFAR10.

Original languageEnglish (US)
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsJennifer Dy, Andreas Krause
PublisherInternational Machine Learning Society (IMLS)
Pages5181-5190
Number of pages10
ISBN (Electronic)9781510867963
StatePublished - 2018
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: Jul 10 2018Jul 15 2018

Publication series

Name35th International Conference on Machine Learning, ICML 2018
Volume7

Other

Other35th International Conference on Machine Learning, ICML 2018
CountrySweden
CityStockholm
Period7/10/187/15/18

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint Dive into the research topics of 'Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations'. Together they form a unique fingerprint.

  • Cite this

    Lu, Y., Zhong, A., Li, Q., & Dong, B. (2018). Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. In J. Dy, & A. Krause (Eds.), 35th International Conference on Machine Learning, ICML 2018 (pp. 5181-5190). (35th International Conference on Machine Learning, ICML 2018; Vol. 7). International Machine Learning Society (IMLS).