同时了解从连续词典发出的混合物的网格学习

论文标题

同时了解从连续词典发出的混合物的网格学习

Simultaneous off-the-grid learning of mixtures issued from a continuous dictionary

论文作者

Butucea, Cristina, Delmas, Jean-François, Dutfoy, Anne, Hardy, Clément

论文摘要

在本文中，我们观察到一套被噪音破坏的信号的集合。每个信号都是属于连续词典的未知数特征的有限混合物。连续词典通过真实的非线性参数参数化。我们将假设信号通过假设每个信号的活动特征包含有限和稀疏集中，共享一个基础结构。我们制定正规化优化问题，以同时估计混合物和特征的非线性参数中的线性系数。优化问题由数据保真度和$（\ ell_1，l^p）$ - 罚款组成。我们将其解决方案称为“群 - 非Linainear-lasso”，并使用证书功能在预测错误上提供高概率界限。遵循有关离网方法几何形状的最新著作，我们表明可以构建此类功能，只要活动特征的参数与Riemannian指标相对于不变，信号的数量是有限的，并且假定噪声是高斯的，我们可以随机使用$ p = 1 $ p = 1 $ p = 2 $ gauss $ gausp^$ gausp^$ gauss。当$ p = 2 $时，我们的预测误差达到了多任务线性回归模型中的组lasso估计器所获得的速率。此外，对于$ p = 2 $，这些预测率比$ p = 1 $要快，当所有信号共享大多数非线性参数时。

In this paper we observe a set, possibly a continuum, of signals corrupted by noise. Each signal is a finite mixture of an unknown number of features belonging to a continuous dictionary. The continuous dictionary is parametrized by a real non-linear parameter. We shall assume that the signals share an underlying structure by assuming that each signal has its active features included in a finite and sparse set. We formulate regularized optimization problem to estimate simultaneously the linear coefficients in the mixtures and the non-linear parameters of the features. The optimization problem is composed of a data fidelity term and a $(\ell_1,L^p)$-penalty. We call its solution the Group-Nonlinear-Lasso and provide high probability bounds on the prediction error using certificate functions. Following recent works on the geometry of off-the-grid methods, we show that such functions can be constructed provided the parameters of the active features are pairwise separated by a constant with respect to a Riemannian metric.When the number of signals is finite and the noise is assumed Gaussian, we give refinements of our results for $p=1$ and $p=2$ using tail bounds on suprema of Gaussian and $χ^2$ random processes. When $p=2$, our prediction error reaches the rates obtained by the Group-Lasso estimator in the multi-task linear regression model. Furthermore, for $p=2$ these prediction rates are faster than for $p=1$ when all signals share most of the non-linear parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题