论文标题
使用变压器网络上图像序列上的时空户外照明聚合
Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks
论文作者
论文摘要
在这项工作中,我们通过从图像中汇总单个嘈杂的估计值来关注室外照明估计,从而利用广角摄像机和/或时间图像序列利用丰富的图像信息。照片固有地以阴影和阴影的形式编码有关场景照明的信息。恢复照明是一个反向渲染问题,也是如此不适。基于深层神经网络的最新工作显示了单图照明估计的有希望的结果,但遭受了健壮性的影响。我们通过在图像序列的角度和时间域中采样的几个图像视图中结合照明估计来解决此问题。对于此任务,我们介绍了一种以末端2端方式训练的变压器体系结构,而没有任何以前的工作要求进行任何统计后处理。因此,我们提出了一个位置编码,该位置编码考虑到摄像机校准和自我运动估计,以在计算视觉单词之间的注意力时在全球范围内注册单个估计。我们表明,与最先进的方法相比,我们的方法可以改善照明估计,同时需要更少的超参数。
In this work, we focus on outdoor lighting estimation by aggregating individual noisy estimates from images, exploiting the rich image information from wide-angle cameras and/or temporal image sequences. Photographs inherently encode information about the scene's lighting in the form of shading and shadows. Recovering the lighting is an inverse rendering problem and as that ill-posed. Recent work based on deep neural networks has shown promising results for single image lighting estimation, but suffers from robustness. We tackle this problem by combining lighting estimates from several image views sampled in the angular and temporal domain of an image sequence. For this task, we introduce a transformer architecture that is trained in an end-2-end fashion without any statistical post-processing as required by previous work. Thereby, we propose a positional encoding that takes into account the camera calibration and ego-motion estimation to globally register the individual estimates when computing attention between visual words. We show that our method leads to improved lighting estimation while requiring less hyper-parameters compared to the state-of-the-art.