Theory of sequence modelling

Sequence modelling is an important aspect of natural language processing and time series analysis, and forms the backbone of modern large language models. We are interested in building the mathematical theory of modelling sequences using various models. For example, recurrent neural networks (RNNs) have long been used as a method to model input-output relationships involving sequence data. However, a theoretical understanding of the mechanisms of RNNs’ approximation characteristics and optimization dynamics remains quite limited. In particular, it is empirically observed that the RNN is not suited to model data with long-term memory dependence (which is often the case in language applications). In a series of works, we investigate the mathematical formulation of approximating sequence relationships using a variety of models, including RNNs, temporal CNNs, encoder-decoders, and transformers. We make precise the relationship of memory (a notion that can be made rigorous) and approximation/optimisation efficiency.

The basic mathematical setting that differs from the static supervised learning is as follows:

The simplest example of a sequence model is the recurrent neural network, which model such functionals with memory using a hidden dynamical system

This is somewhat like the reverse of the Mori-Zwanzig formalism in statistical physics, where hidden dynamics are replaced by memory structures. Through a simplified analysis, we can show that there is a curse of memory that is associated with RNNs that prevents it from efficiently capturing long-term memory, but this can be overcome by alternative architectures, such as temporal convolutional networks or transformers.

Papers

[1] S. Wang and Q. Li, ‘StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization’, in Proceedings of the 41th International Conference on Machine Learning, PMLR, Jul. 2024.

[2] F. Liu and Q. Li, ‘From Generalization Analysis to Optimization Designs for State Space Models’, in Proceedings of the 41th International Conference on Machine Learning, PMLR, Jul. 2024.

[3] S. Wang, Z. Li, and Q. Li, ‘Inverse Approximation Theory for Nonlinear Recurrent Neural Networks’, in International Conference on Learning Representations, May 2024. [Online]. Available: http://arxiv.org/abs/2305.19190

[4] H. Jiang and Q. Li, ‘Forward and Inverse Approximation Theory for Linear Temporal Convolutional Networks’, in Geometric Science of Information, in Lecture Notes in Computer Science. Aug. 2023, pp. 342–350. doi: 10.1007/978-3-031-38299-4_36.

[5] H. Jiang, Q. Li, Z. Li, and S. Wang, ‘A Brief Survey on the Approximation Theory for Sequence Modelling’, JML, vol. 2, no. 1, pp. 1–30, Jun. 2023, doi: 10.4208/jml.221221.

[6] Z. Li, H. Jiang, and Q. Li, ‘On the approximation properties of recurrent encoder-decoder architectures’, in International Conference on Learning Representations, Apr. 2022. [Online]. Available: https://openreview.net/forum?id=xDIvIqQ3DXD

[7] Z. Li, J. Han, W. E, and Q. Li, ‘Approximation and Optimization Theory for Linear Continuous-Time Recurrent Neural Networks’, Journal of Machine Learning Research, vol. 23, no. 42, pp. 1–85, 2022.

[8] H. Jiang, Z. Li, and Q. Li, ‘Approximation Theory of Convolutional Architectures for Time Series Modelling’, in Proceedings of the 38th International Conference on Machine Learning, PMLR, Jul. 2021, pp. 4961–4970. [Online]. Available: https://proceedings.mlr.press/v139/jiang21d.html

[9] Z. Li, J. Han, W. E, and Q. Li, ‘On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis’, in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=8Sqhl-nF50

Slides

Loader Loading...
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab

Download