Machine Learning

Learning and understanding transform is the starting point of my journey in machine learning. I continue on this meaningful journey by working on the approximation theory of deep neural networks with simple architectures.

Approximation of Residual Neural Network

    1. We build upon the dynamical systems approach to deep learning, in which deep residual networks are idealized as continuous-time dynamical systems from an approximation perspective. In particular, we establish a general universal approximation of residual networks for a wide range of activation functions using the flow maps of dynamical systems. In specific cases, we also establish the rates of approximation in terms of the time horizon. Overall, these results reveal that the approximation of composition functions through flow maps presents a new paradigm in approximation theory, contributing to the development of a useful mathematical framework for investigating deep learning.

    2. Qianxiao Li, Ting Lin, Zuowei Shen, Deep Learning via Dynamical Systems: An Approximation Perspective, Journal of European Mathematical Society, (2022) published online. PDF

Approximation of ReLu Network

    1. In this series of papers, we quantitatively characterize the approximation power of ReLU deep neural networks in terms of width and depth. An optimal or close-to-optimal rate for continuous and smooth functions has been established in [2, 3, 6, 7].

      The analysis is applied to characterize the approximation rate of variations of ReLU networks. One such variation of ReLU network is the network that uses more than one activation function. This new class of networks overcomes the curse of dimensionality in approximation power, as shown in [4, 5].

      It is of great interest to achieve successful deep learning with a small number of learnable parameters that can adapt to the target function. In [1], we show that the number of parameters of ReLU networks that need to be learned can be significantly smaller than typically expected to achieve an attractive approximation rate.

    2. Zuowei Shen, Haizhao Yang, Shijun Zhang, Deep Network Approximation in Terms of Intrinsic Parameters, Thirty-ninth International conference on machine learning (ICML), (2022) PDF
    3. Zuowei Shen, Haizhao Yang, Shijun Zhang, Optimal Approximation Rate of ReLU Networks in Terms of Width and Depth, Journal de Mathématiques Pures et Appliquées, 157, (2022), 101-135. PDF
    4. Jianfeng Lu, Zuowei Shen, Haizhao Yang, Shijun Zhang, Deep Network Approximation for Smooth Functions, SIAM Journal on Mathematical Analysis, 53(5), (2021), 5465-5506 PDF
    5. Zuowei Shen, Haizhao Yang, Shijun Zhang, Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth, Neural Computation, 33(4), (2021), 1005-1036. PDF
    6. Zuowei Shen, Haizhao Yang, Shijun Zhang, Neural Network Approximation: Three Hidden Layers Are Enough, Neural Networks, 141 (2021), 160-173. ​PDF
    7. Zuowei Shen, Haizhao Yang, Shijun Zhang, Deep network approximation characterized by a number of neurons,  Communications in Computational Physics, 28 (2020), 1768-1811. PDF
    8. Zuowei Shen, Haizhao Yang, Shijun Zhang, Nonlinear approximation via compositions, Neural Networks, 119, (2019), 74-84.  PDF

Architecture of Deep Network

    1. Three critical ingredients of deep network architecture are the activation function, width, and depth of the neural network.

      We started by designing a new architecture with a simple, computable, and continuous activation function. This simple feed-forward neural network achieves the universal approximation property for arbitrary functions with a width of 36d(2d + 1) and depth of 11. Hence, for supervised learning and its related regression problems, the hypothesis space generated by these networks with a size not smaller than 36d(2d+ 1)×11 is dense in a wide range of function spaces.

      Next, we designed a three-dimensional neural network architecture by introducing an additional dimension called height, beyond width and depth. This new network architecture is constructed recursively via a nested structure. The three-dimensional network architectures are significantly more expressive than the ones with two-dimensional architectures.

    2. Zuowei Shen, Haizhao Yang, Shijun Zhang, Deep Network Approximation: Achieving Arbitrary Accuracy with Fixed Number of Neurons, Journal of Machine Learning Research, 23, (2022), PDF
    3. Zuowei Shen, Haizhao Yang, Shijun Zhang, Deep Network Architecture Beyond Width and Depth,
      36th Conference on Neural Information Processing Systems
      (NeurIPS 2022).PDF