The article concerns low-rank approximation of matrices generated by sampling a smooth function of two $m$-dimensional variables. We refute an argument made in the literature that, for a specific class of analytic functions, such matrices admit accurate entrywise approximation of rank that is independent of $m$. We provide a theoretical explanation of the numerical results presented in support of this argument, describing three narrower classes of functions for which $n \times n$ function-generated matrices can be approximated within an entrywise error of order $\varepsilon$ with rank $\mathcal{O}(\log(n) \varepsilon^{-2} \mathrm{polylog}(\varepsilon^{-1}))$ that is independent of the dimension $m$: (i) functions of the inner product of the two variables, (ii) functions of the squared Euclidean distance between the variables, and (iii) shift-invariant positive-definite kernels. We extend our argument to low-rank tensor-train approximation of tensors generated with functions of the multi-linear product of their $m$-dimensional variables. We discuss our results in the context of low-rank approximation of attention in transformer neural networks.

通过对两个m维变量的光滑函数进行采样生成的矩阵的低秩逼近是本文关注的重点。我们否定了先前文献中对一个特定类别的解析函数所提出的论点，即这些矩阵可以独立于m具有准确的逐个元素的秩逼近。我们在理论上解释了支持该论点的数值结果，并描述了三个更窄的函数类别，其中n×n由函数生成的矩阵可以在与维度m无关的情况下以O(log(n)ε^(-2)polylog(ε^(-1)))的逐个元素误差逼近。我们还将我们的论点扩展到了由m维变量的多线性积生成的张量的低秩张量列逼近。我们在Transformer神经网络的注意力低秩逼近的背景下讨论了我们的结果。

当大数据实际上是低秩的，或者是某个函数生成的矩阵的逐个近似