论文标题
有效的一阶预测 - 矫正器多重目标优化,以实现公平错误信息检测
Efficient first-order predictor-corrector multiple objective optimization for fair misinformation detection
论文作者
论文摘要
多目标优化(MOO)旨在同时优化多个相互冲突的目标,并在机器学习中发现了重要的应用程序,例如最大程度地减少分类损失和在治疗不同人群以保持公平性方面的差异。最佳性,进一步优化一个目标至少必然会损害另一个目标,而决策者需要全面探索多个Optima(称为Pareto Front),以查明一个最终解决方案。我们解决了寻找帕累托阵线的效率。首先,使用随机多偏差下降(SMGD)从划痕中找到正面,对于大型神经网络和数据集很昂贵。我们建议基于预测器 - 矫正器方法,将帕累托阵线作为少数初始Optima的多种流形探索。其次,对于每个探索步骤,预测变量求解了一个大规模的线性系统,该系统在模型参数的数量中二次缩放,并且需要一个反向传播来评估溶解器的二阶Hessian-vector产品。我们提出了一个只有线性缩放的高斯 - 纽顿近似值,并且每次迭代只需要一阶内产物。这还允许在近似线性系统时,在微小和共轭梯度方法之间进行选择。这些创新使大型网络成为可能的预测器。关于多目标(公平和准确性)错误信息检测任务的实验表明,1)预测器 - 校正方法可以在更少的时间内找到比或与SMGD更好或与SMGD相似的方法; 2)所提出的一阶方法不会损害二阶方法识别的帕累托前沿的质量,同时进一步降低了运行时间。
Multiple-objective optimization (MOO) aims to simultaneously optimize multiple conflicting objectives and has found important applications in machine learning, such as minimizing classification loss and discrepancy in treating different populations for fairness. At optimality, further optimizing one objective will necessarily harm at least another objective, and decision-makers need to comprehensively explore multiple optima (called Pareto front) to pinpoint one final solution. We address the efficiency of finding the Pareto front. First, finding the front from scratch using stochastic multi-gradient descent (SMGD) is expensive with large neural networks and datasets. We propose to explore the Pareto front as a manifold from a few initial optima, based on a predictor-corrector method. Second, for each exploration step, the predictor solves a large-scale linear system that scales quadratically in the number of model parameters and requires one backpropagation to evaluate a second-order Hessian-vector product per iteration of the solver. We propose a Gauss-Newton approximation that only scales linearly, and that requires only first-order inner-product per iteration. This also allows for a choice between the MINRES and conjugate gradient methods when approximately solving the linear system. The innovations make predictor-corrector possible for large networks. Experiments on multi-objective (fairness and accuracy) misinformation detection tasks show that 1) the predictor-corrector method can find Pareto fronts better than or similar to SMGD with less time; and 2) the proposed first-order method does not harm the quality of the Pareto front identified by the second-order method, while further reduce running time.