Despite its empirical success, deep learning lacks a concrete mathematical basis from which one can study theory and algorithms in a principled way. One difficulty lies in how one can take into explicit account the depth of neural networks, which is widely regarded as the essence of deep learning. For example, classical approximation theory, statistical learning theory and optimization theory often apply equally to shallow and deep networks, and fail to explain many empirical phenomena. In a series of works, we introduced a mathematical framework to study deep learning based on dynamical systems and optimal control. The key observation is that a (residual) feed-forward neural network can be regarded as a discretization of a continuous-time dynamical system (figure below).
Consequently, supervised learning in deep neural networks can be idealised as an optimal control problem on differential equations. This establishes a direct connection between deep learning and the theory of calculus of variations and optimal control. There are many consequences of this connection, including
- General approximation theorems for deep networks, with connections with controllability theory
- Mathematical formulation of training deep networks as optimal control, with connections to mean-field control theory
- New ways of training and fine-tuning deep networks
- New ways of improving adversarial robustness
Selected papers
[1] C. Zhang, J. Cheng, and Q. Li, ‘An Optimal Control View of LoRA and Binary Control Design for Vision Transformers’, in Proceedings of the European Conference on Computer Vision (ECCV), Sep. 2024.
[2] C. Zhang, J. Cheng, Y. Xu, and Q. Li, ‘Parameter-Efficient Fine-Tuning with Controls’, in Proceedings of the 41th International Conference on Machine Learning, PMLR, Jul. 2024.
[3] Z. Chen, Z. Wang, Y. Yang, Q. Li, and Z. Zhang, ‘PID Control-Based Self-Healing to Improve the Robustness of Large Language Models’, Transactions on Machine Learning Research, Jan. 2024, Accessed: Apr. 18, 2024. [Online]. Available: https://openreview.net/forum?id=Fu4mwB0XIU
[4] Q. Li, T. Lin, and Z. Shen, ‘Deep Neural Network Approximation of Invariant Functions through Dynamical Systems’, Journal of Machine Learning Research, vol. 25, no. 278, pp. 1–57, 2024.
[5] Q. Li, T. Lin, and Z. Shen, ‘Deep learning via dynamical systems: An approximation perspective’, J. Eur. Math. Soc., vol. 25, no. 5, pp. 1671–1709, 2023, doi: 10.4171/JEMS/1221.
[6] Z. Chen, Q. Li, and Z. Zhang, ‘Self-Healing Robust Neural Networks via Closed-Loop Control’, Journal of Machine Learning Research, vol. 23, no. 319, pp. 1–54, Oct. 2022.
[7] Z. Chen, Q. Li, and Z. Zhang, ‘Towards Robust Neural Networks via Close-loop Control’, in International Conference on Learning Representations, Sep. 2020. Accessed: Mar. 15, 2021. [Online]. Available: https://openreview.net/forum?id=2AL06y9cDE-
[8] Q. Li, C. Tai, and W. E, ‘Stochastic Modified Equations and Dynamics of Stochastic Gradient Algorithms I: Mathematical Foundations’, Journal of Machine Learning Research, vol. 20, no. 40, pp. 1–47, 2019.
[9] W. E, J. Han, and Q. Li, ‘A mean-field optimal control formulation of deep learning’, Research in the Mathematical Sciences, vol. 6, no. 1, p. 10, 2019.
[10] Q. Li and S. Hao, ‘An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks’, in Proceedings of the 35th international conference on machine learning, 2018, pp. 2985–2994.
[11] Q. Li, L. Chen, C. Tai, and W. E, ‘Maximum principle based algorithms for deep learning’, The Journal of Machine Learning Research, vol. 18, no. 1, pp. 5998–6026, 2018.
[12] Q. Li, C. Tai, and W. E, ‘Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms’, in Proceedings of the 34th International Conference on Machine Learning, PMLR, Jul. 2017, pp. 2101–2110. Accessed: Mar. 27, 2022. [Online]. Available: https://proceedings.mlr.press/v70/li17f.html