论文标题
张量张量回归:Riemannian优化,过度参数化,统计计算间隙及其相互作用
Tensor-on-Tensor Regression: Riemannian Optimization, Over-parameterization, Statistical-computational Gap, and Their Interplay
论文作者
论文摘要
我们研究了张量张量的回归,其中的目标是将张量响应与张量协变量与塔克等级参数张量/矩阵连接起来,而没有其内在等级的先验知识。我们提出了Riemannian梯度下降(RGD)和Riemannian Gauss-Newton(RGN)方法,并通过研究等级过度参数化的影响来应对未知等级的挑战。我们通过表明RGD和RGN分别线性地和四边形地收敛到两个等级的统计最佳估计值,从而为一般的张量调节回归提供了第一个收敛保证。我们的理论揭示了一种有趣的现象:Riemannian优化方法自然地适应了过度参数化,而无需修改其实施。我们还通过直接的低级多项式论证证明了标量调节回归中的统计计算差距。我们的理论展示了“统计计算差距的祝福”现象:在张张量张量的回归中,对于三个或更高的订单张量,计算所需的样本量匹配中等级别参数所需的样本量时,考虑到计算可行的估算器时,在这种情况下,在这种情况下,在这种情况下,在这种情况下,在这种情况下,在这种情况下,在这种情况下,在这种情况下,在这种情况下,在这种情况下,在这种情况下,在这种情况下,在这种情况下,在这种情况下,则无与伦比。这表明中等等级的过度参数化本质上是根据三分或更高订单的张量张量回归的样本量“无成本”。最后,我们进行仿真研究以显示我们提出的方法的优势并证实我们的理论发现。
We study the tensor-on-tensor regression, where the goal is to connect tensor responses to tensor covariates with a low Tucker rank parameter tensor/matrix without the prior knowledge of its intrinsic rank. We propose the Riemannian gradient descent (RGD) and Riemannian Gauss-Newton (RGN) methods and cope with the challenge of unknown rank by studying the effect of rank over-parameterization. We provide the first convergence guarantee for the general tensor-on-tensor regression by showing that RGD and RGN respectively converge linearly and quadratically to a statistically optimal estimate in both rank correctly-parameterized and over-parameterized settings. Our theory reveals an intriguing phenomenon: Riemannian optimization methods naturally adapt to over-parameterization without modifications to their implementation. We also prove the statistical-computational gap in scalar-on-tensor regression by a direct low-degree polynomial argument. Our theory demonstrates a "blessing of statistical-computational gap" phenomenon: in a wide range of scenarios in tensor-on-tensor regression for tensors of order three or higher, the computationally required sample size matches what is needed by moderate rank over-parameterization when considering computationally feasible estimators, while there are no such benefits in the matrix settings. This shows moderate rank over-parameterization is essentially "cost-free" in terms of sample size in tensor-on-tensor regression of order three or higher. Finally, we conduct simulation studies to show the advantages of our proposed methods and to corroborate our theoretical findings.