Changes

Jump to: navigation, search

Allie

21 bytes added, 18:27, 11 September 2019
no edit summary
supported by '''SGDR''' ([https://en.wikipedia.org/wiki/Stochastic_gradient_descent Stochastic Gradient Descent] with Warm Restarts) <ref>[[Ilya Loshchilov]], [[Frank Hutter]] ('''2016'''). ''SGDR: Stochastic Gradient Descent with Warm Restarts''. [https://arxiv.org/abs/1608.03983 arXiv:1608.03983]</ref>
and '''GGT''' (full-matrix adaptive [https://en.wikipedia.org/wiki/Regularization_(mathematics) regularization]) <ref>[[Naman Agarwal]], [[Brian Bullins]], [[Xinyi Chen]], [[Elad Hazan]], [[Karan Singh]], [[Cyril Zhang]], [[Yi Zhang]] ('''2018'''). ''The Case for Full-Matrix Adaptive Regularization''. [https://arxiv.org/abs/1806.02958 arXiv:1806.02958]</ref>,
using [https://en.wikipedia.org/wiki/Batch_normalization batch renormalization] <ref>[[Mathematician#SIoffe|Sergey Ioffe]] ('''2017'''). ''Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models''. [https://arxiv.org/abs/1702.03275 arXiv:1702.03275]</ref>,
and adding [https://en.wikipedia.org/wiki/Gradient_noise gradient noise] <ref>[[Arvind Neelakantan]], [[Luke Vilnis]], [[Quoc V. Le]], [[Ilya Sutskever]], [[Lukasz Kaiser]], [[Karol Kurach]], [[James Martens]] ('''2015'''). ''Adding Gradient Noise Improves Learning for Very Deep Networks''. [https://arxiv.org/abs/1511.06807 arXiv:1511.06807]</ref>.

Navigation menu