Changes

Allie

21 bytes added, 18:27, 11 September 2019

no edit summary

supported by '''SGDR''' ([https://en.wikipedia.org/wiki/Stochastic_gradient_descent Stochastic Gradient Descent] with Warm Restarts) <ref>[[Ilya Loshchilov]], [[Frank Hutter]] ('''2016'''). ''SGDR: Stochastic Gradient Descent with Warm Restarts''. [https://arxiv.org/abs/1608.03983 arXiv:1608.03983]</ref>

and '''GGT''' (full-matrix adaptive [https://en.wikipedia.org/wiki/Regularization_(mathematics) regularization]) <ref>[[Naman Agarwal]], [[Brian Bullins]], [[Xinyi Chen]], [[Elad Hazan]], [[Karan Singh]], [[Cyril Zhang]], [[Yi Zhang]] ('''2018'''). ''The Case for Full-Matrix Adaptive Regularization''. [https://arxiv.org/abs/1806.02958 arXiv:1806.02958]</ref>,

using [https://en.wikipedia.org/wiki/Batch_normalization batch renormalization] <ref>[[Mathematician#SIoffe|Sergey Ioffe]] ('''2017'''). ''Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models''. [https://arxiv.org/abs/1702.03275 arXiv:1702.03275]</ref>,

and adding [https://en.wikipedia.org/wiki/Gradient_noise gradient noise] <ref>[[Arvind Neelakantan]], [[Luke Vilnis]], [[Quoc V. Le]], [[Ilya Sutskever]], [[Lukasz Kaiser]], [[Karol Kurach]], [[James Martens]] ('''2015'''). ''Adding Gradient Noise Improves Learning for Very Deep Networks''. [https://arxiv.org/abs/1511.06807 arXiv:1511.06807]</ref>.

GerdIsenberg

Bureaucrats, Administrators

25,161

edits

Changes

Allie

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools