Nostalgic ADAM: Weighting more of the past gradients when designing the adaptive learning rate

Haiwen Huang, Chang Wang, Bin Dong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

First-order optimization algorithms have been proven prominent in deep learning. In particular, algorithms such as RMSProp and Adam are extremely popular. However, recent works have pointed out the lack of “long-term memory” in Adam-like algorithms, which could hamper their performance and lead to divergence. In our study, we observe that there are benefits of weighting more of the past gradients when designing the adaptive learning rate. We therefore propose an algorithm called the Nostalgic Adam (NosAdam) with theoretically guaranteed convergence at the best known convergence rate. NosAdam can be regarded as a fix to the non-convergence issue of Adam in alternative to the recent work of [Reddi et al., 2018]. Our preliminary numerical experiments show that NosAdam is a promising alternative algorithm to Adam. The proofs, code, and other supplementary materials are already released.

Original languageEnglish (US)
Title of host publicationProceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019
EditorsSarit Kraus
PublisherInternational Joint Conferences on Artificial Intelligence
Pages2556-2562
Number of pages7
ISBN (Electronic)9780999241141
Publication statusPublished - Jan 1 2019
Event28th International Joint Conference on Artificial Intelligence, IJCAI 2019 - Macao, China
Duration: Aug 10 2019Aug 16 2019

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
Volume2019-August
ISSN (Print)1045-0823

Conference

Conference28th International Joint Conference on Artificial Intelligence, IJCAI 2019
CountryChina
CityMacao
Period8/10/198/16/19

    Fingerprint

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Huang, H., Wang, C., & Dong, B. (2019). Nostalgic ADAM: Weighting more of the past gradients when designing the adaptive learning rate. In S. Kraus (Ed.), Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019 (pp. 2556-2562). (IJCAI International Joint Conference on Artificial Intelligence; Vol. 2019-August). International Joint Conferences on Artificial Intelligence.