MVStylizer：手机的有效的边缘辅助视频感性传输系统

论文标题

MVStylizer：手机的有效的边缘辅助视频感性传输系统

MVStylizer: An Efficient Edge-Assisted Video Photorealistic Style Transfer System for Mobile Phones

论文作者

Li, Ang, Wu, Chunpeng, Chen, Yiran, Ni, Bin

论文摘要

最近的研究在实现图像的神经风格转移方面取得了长足的进步，这表示将图像转换为所需的样式。许多用户开始使用手机记录其日常生活，然后与其他用户编辑和分享捕获的图像和视频。但是，直接在视频上应用现有样式转移方法，即逐帧传输视频的样式，需要大量的计算资源。在技术上，在手机上进行视频的样式转移在技术上仍然是无法承受的。为了应对这一挑战，我们提出了MVStylizer，这是一种用于手机的有效的边缘辅助影像视频传输系统。与其逐帧执行风格化，不如原始视频中的关键帧由Edge服务器上的预训练的深神经网络（DNN）处理，而其余的程式化的中间框架是由我们设计的基于光学的帧插界插值算法生成的。还提出了一个元平滑模块，同时将风格化的框架同时上升以进行任意分辨率并删除这些高尺度框架中的样式转移相关扭曲。此外，为了连续增强边缘服务器上DNN模型的性能，我们采用了联合学习方案，以通过从移动客户端收集的数据来继续对Edge服务器上的每个DNN模型进行重新训练，并与Cloud Server上的全局DNN模型同步。这样的计划有效地利用了来自各种移动客户的收集数据的多样性，并有效地改善了系统性能。我们的实验表明，与最先进的方法相比，MVStylizer可以以更高的视觉质量生成风格化的视频，同时实现75.5 $ \ times $速度的1920年$ \ times $ \ times $ 1080视频。

Recent research has made great progress in realizing neural style transfer of images, which denotes transforming an image to a desired style. Many users start to use their mobile phones to record their daily life, and then edit and share the captured images and videos with other users. However, directly applying existing style transfer approaches on videos, i.e., transferring the style of a video frame by frame, requires an extremely large amount of computation resources. It is still technically unaffordable to perform style transfer of videos on mobile phones. To address this challenge, we propose MVStylizer, an efficient edge-assisted photorealistic video style transfer system for mobile phones. Instead of performing stylization frame by frame, only key frames in the original video are processed by a pre-trained deep neural network (DNN) on edge servers, while the rest of stylized intermediate frames are generated by our designed optical-flow-based frame interpolation algorithm on mobile phones. A meta-smoothing module is also proposed to simultaneously upscale a stylized frame to arbitrary resolution and remove style transfer related distortions in these upscaled frames. In addition, for the sake of continuously enhancing the performance of the DNN model on the edge server, we adopt a federated learning scheme to keep retraining each DNN model on the edge server with collected data from mobile clients and syncing with a global DNN model on the cloud server. Such a scheme effectively leverages the diversity of collected data from various mobile clients and efficiently improves the system performance. Our experiments demonstrate that MVStylizer can generate stylized videos with an even better visual quality compared to the state-of-the-art method while achieving 75.5$\times$ speedup for 1920$\times$1080 videos.

下载PDF全文

下载文献需遵守相关版权规定

论文标题