论文标题
重新思考Fano在合奏学习中的不平等
Rethinking Fano's Inequality in Ensemble Learning
论文作者
论文摘要
我们提出了一个关于整体学习的基本理论,它回答了一个主要问题:哪些因素使整体系统好坏?先前的研究使用了FANO的信息理论不平等的变体,并根据$ \ textit {cercucy} $和$ \ textit {多样性} $得出了分类错误率的下限。我们重新审视了原始的Fano的不平等,并认为当将多个模型预测合并为最终预测时,研究并未考虑到丢失的信息。为了解决这个问题,我们将先前的理论概述为结合了信息损失,我们将其命名为$ \ textit {组合损失} $。此外,我们通过对实际系统的广泛实验来验证并证明所提出的理论。该理论揭示了每个度量标准上系统的优势和劣势,这将推动集合学习的理论理解,并使我们对设计系统的见解。
We propose a fundamental theory on ensemble learning that answers the central question: what factors make an ensemble system good or bad? Previous studies used a variant of Fano's inequality of information theory and derived a lower bound of the classification error rate on the basis of the $\textit{accuracy}$ and $\textit{diversity}$ of models. We revisit the original Fano's inequality and argue that the studies did not take into account the information lost when multiple model predictions are combined into a final prediction. To address this issue, we generalize the previous theory to incorporate the information loss, which we name $\textit{combination loss}$. Further, we empirically validate and demonstrate the proposed theory through extensive experiments on actual systems. The theory reveals the strengths and weaknesses of systems on each metric, which will push the theoretical understanding of ensemble learning and give us insights into designing systems.