Approximation of Residual Neural Network
We build on the dynamical systems approach to deep learning, where deep residual networks are idealized as continuous-time dynamical systems, from the approximation perspective. In particular, we establish general universal approximation of residual network for a wide range of activation functions using flow maps of dynamical systems. In specific cases, rates of approximation in terms of the time horizon are also established. Overall, these results reveal that composition function approximation through flow maps present a new paradigm in approximation theory and contributes to building a useful mathematical framework to investigate deep learning.
- Qianxiao Li, Ting Lin, Zuowei Shen, Deep Learning via Dynamical Systems: An Approximation Perspective, Journal of European Mathematical Society, (2022) published online. PDF
Approximation of ReLu Network
In this series of papers, we quantitatively characterizes the approximation power of ReLu deep neural networks in terms of the width and depth. An optimal/close to optimal rate for continues and smooth functions is established in [2,3,6,7].
- Zuowei Shen, Haizhao Yang, Shijun Zhang, Deep Network Approximation in Terms of Intrinsic Parameters, Thirty-ninth International conference on machine learning (ICML), (2022) PDF
- Zuowei Shen, Haizhao Yang, Shijun Zhang, Optimal Approximation Rate of ReLU Networks in Terms of Width and Depth, Journal de Mathématiques Pures et Appliquées, 157, (2022), 101-135. PDF
- Jianfeng Lu, Zuowei Shen, Haizhao Yang, Shijun Zhang, Deep Network Approximation for Smooth Functions, SIAM Journal on Mathematical Analysis, 53(5), (2021), 5465-5506 PDF
- Zuowei Shen, Haizhao Yang, Shijun Zhang, Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth, Neural Computation, 33(4), (2021), 1005-1036. PDF
- Zuowei Shen, Haizhao Yang, Shijun Zhang, Neural Network Approximation: Three Hidden Layers Are Enough, Neural Networks, 141 (2021), 160-173. PDF
- Zuowei Shen, Haizhao Yang, Shijun Zhang, Deep network approximation characterized by a number of neurons, Communications in Computational Physics, 28 (2020), 1768-1811. PDF
- Zuowei Shen, Haizhao Yang, Shijun Zhang, Nonlinear approximation via compositions, Neural Networks, 119, (2019), 74-84. PDF
The analysis is applied to characterize approximation rate of variations of Relu networks. One of such variations of ReLU network is the network by using more than one activation functions. This new class of networks overcomes the curse of dimensionality in approximation power as shown in [4, 5].
It is of a great interest to achieve successful deep learning with a small number of learnable parameters adapting to the target function. In , we show that the number of parameters of ReLu networks that need to be learned can be significantly smaller than typically expect to achieve an attractive approximation rate.
Architecture of Deep Network
Three key ingredients of a deep network architecture are the activation function used, the width and depth of the neural network.
- Zuowei Shen, Haizhao Yang, Shijun Zhang, Deep Network Approximation: Achieving Arbitrary Accuracy with Fixed Number of Neurons, Journal of Machine Learning Research, 23, (2022), PDF
- Zuowei Shen, Haizhao Yang, Shijun Zhang, Deep Network Architecture Beyond Width and Depth,
36th Conference on Neural Information Processing Systems (NeurIPS 2022).PDF
A new architecture is developed by designing a simple, computable, and continuous activation function. This simple feed-forward neural networks achieves the universal approximation property for all continuous functions with width 36d(2d + 1) and depth 11. Hence, for supervised learning and its related regression problems, the hypothesis space generated by these networks with a size not smaller than 36d(2d+ 1)×11 is dense in a wide range of function spaces.
Three dimensional neural network architecture is developed by introducing an additional dimension called height beyond width and depth. The new network architecture is constructed recursively via a nested structure. The three dimensional network architectures are significantly more expressive than the ones with two-dimensional architectures.