Changes

Jump to: navigation, search

Neural Networks

48 bytes removed, 22:46, 9 January 2019
no edit summary
==Backpropagation==
In 1974, [[Mathematician#PWerbos|Paul Werbos]] started to end the AI winter concerning neural networks, when he first described the mathematical process of training [https://en.wikipedia.org/wiki/Multilayer_perceptron multilayer perceptrons] through [https://en.wikipedia.org/wiki/Backpropagation backpropagation] of errors <ref>[[Mathematician#PWerbos|Paul Werbos]] ('''1974'''). ''Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences''. Ph. D. thesis, [[Harvard University]]</ref>, derived in the context of [https://en.wikipedia.org/wiki/Control_theory control theory] by [https://en.wikipedia.org/wiki/Henry_J._Kelley Henry J. Kelley] in 1960 <ref>[https://en.wikipedia.org/wiki/Henry_J._Kelley Henry J. Kelley] ('''1960'''). ''[http://arc.aiaa.org/doi/abs/10.2514/8.5282?journalCode=arsj& Gradient Theory of Optimal Flight Paths]''. [http://arc.aiaa.org/loi/arsj ARS Journal, Vol. 30, No. 10</ref> and by [https://en.wikipedia.org/wiki/Arthur_E._Bryson Arthur E. Bryson] in 1961 <ref>[https://en.wikipedia.org/wiki/Arthur_E._Bryson Arthur E. Bryson] ('''1961'''). ''A gradient method for optimizing multi-stage allocation processes''. In Proceedings of the [[Harvard University]] Symposium on digital computers and their applications</ref> using principles of [[Dynamic Programming|dynamic programming]], simplified by [https://en[Mathematician#SEDreyfus|Stuart E.wikipedia.org/wiki/Stuart_Dreyfus Stuart Dreyfus]] in 1961 applying the [https://en.wikipedia.org/wiki/Chain_rule chain rule] <ref>[https://en[Mathematician#SEDreyfus|Stuart E.wikipedia.org/wiki/Stuart_Dreyfus Stuart Dreyfus]] ('''1961'''). ''[http://www.rand.org/pubs/papers/P2374.html The numerical solution of variational problems]''. RAND paper P-2374</ref>. It was in 1982, when Werbos applied a [https://en.wikipedia.org/wiki/Automatic_differentiation automatic differentiation] method described in 1970 by [[Mathematician#SLinnainmaa|Seppo Linnainmaa]] <ref>[[Mathematician#SLinnainmaa|Seppo Linnainmaa]] ('''1970'''). ''The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors''. Master's thesis, [https://en.wikipedia.org/wiki/University_of_Helsinki University of Helsinki]</ref> to neural networks in the way that is widely used today <ref>[[Mathematician#PWerbos|Paul Werbos]] ('''1982'''). ''Applications of advances in nonlinear sensitivity analysis''. [http://link.springer.com/book/10.1007%2FBFb0006119 System Modeling and Optimization], [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer], [http://werbos.com/Neural/SensitivityIFIPSeptember1981.pdf pdf]</ref> <ref>[[Mathematician#PWerbos|Paul Werbos]] ('''1994'''). ''The Roots of Backpropagation. From Ordered Derivatives to Neural Networks and Political Forecasting''. [https://en.wikipedia.org/wiki/John_Wiley_%26_Sons John Wiley & Sons]</ref> <ref>[http://www.scholarpedia.org/article/Deep_Learning#Backpropagation Deep Learning - Scholarpedia | Backpropagation] by [[Jürgen Schmidhuber]]</ref> <ref>[http://people.idsia.ch/~juergen/who-invented-backpropagation.html Who Invented Backpropagation?] by [[Jürgen Schmidhuber]] (2014, 2015)</ref>.
Backpropagation is a generalization of the [https://en.wikipedia.org/wiki/Delta_rule delta] rule to multilayered [https://en.wikipedia.org/wiki/Feedforward_neural_network feedforward networks], made possible by using the [https://en.wikipedia.org/wiki/Chain_rule chain rule] to iteratively compute [https://en.wikipedia.org/wiki/Gradient gradients] for each layer. Backpropagation requires that the [https://en.wikipedia.org/wiki/Activation_function activation function] used by the artificial neurons be [https://en.wikipedia.org/wiki/Differentiable_function differentiable], which is true for the common [https://en.wikipedia.org/wiki/Sigmoid_function sigmoid] [https://en.wikipedia.org/wiki/Logistic_function logistic function] or its [https://en.wikipedia.org/wiki/Softmax_function softmax] generalization in [https://en.wikipedia.org/wiki/Multiclass_classification multiclass classification].
* [https://en.wikipedia.org/wiki/Henry_J._Kelley Henry J. Kelley] ('''1960'''). ''[http://arc.aiaa.org/doi/abs/10.2514/8.5282?journalCode=arsj& Gradient Theory of Optimal Flight Paths]''. [http://arc.aiaa.org/loi/arsj ARS Journal, Vol. 30, No. 10 » [[Neural Networks#Backpropagation|Backpropagation]]
* [https://en.wikipedia.org/wiki/Arthur_E._Bryson Arthur E. Bryson] ('''1961'''). ''A gradient method for optimizing multi-stage allocation processes''. In Proceedings of the [[Harvard University]] Symposium on digital computers and their applications » [[Neural Networks#Backpropagation|Backpropagation]]
* [https://en[Mathematician#SEDreyfus|Stuart E.wikipedia.org/wiki/Stuart_Dreyfus Stuart Dreyfus]] ('''1961'''). ''[http://www.rand.org/pubs/papers/P2374.html The numerical solution of variational problems]''. RAND paper P-2374 » [[Neural Networks#Backpropagation|Backpropagation]]
* [https://en.wikipedia.org/wiki/Frank_Rosenblatt Frank Rosenblatt] ('''1962'''). ''[http://catalog.hathitrust.org/Record/000203591 Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms]''. Spartan Books
* [https://en.wikipedia.org/wiki/Alexey_Grigorevich_Ivakhnenko Alexey G. Ivakhnenko] ('''1965'''). ''Cybernetic Predicting Devices''. [https://en.wikipedia.org/wiki/Naukova_Dumka Naukova Dumka]

Navigation menu