论文标题

深入的感觉替代:无创的,使生物神经网络能够从人工神经网络接收意见

Deep Sensory Substitution: Noninvasively Enabling Biological Neural Networks to Receive Input from Artificial Neural Networks

论文作者

Port, Andrew, Kim, Chelhwon, Patel, Mitesh

论文摘要

正如格言“图片值得一千个单词”所表达的那样,当使用口头语言传达视觉信息时,简洁可能是一个挑战。这项工作描述了一种新颖的技术,用于利用机器学习的功能嵌入将视觉(和其他类型的)信息超声为感知音频域,从而使用户只使用其听觉教师来感知此信息。该系统使用验证的图像嵌入网络来提取视觉特征,并将其嵌入欧几里得空间的紧凑子集中 - 这将图像转换为特征向量,其$ l^2 $距离可以用作相似性的有意义的度量。然后,使用生成的对抗网络(GAN)来查找特征向量的度量空间的距离映射到由配备有欧几里得公制或梅尔斯波斯基于CEPSTRUM基于Cepstrum的精神距离距离的目标音频数据集定义的度量空间。我们通过将面孔的图像纳入人类言语般的音频来演示这种技术。对于两个目标音频指标,GAN成功地找到了度量标准映射,并且在人体主题测试中,用户能够准确地对面部的音频进行分类。

As is expressed in the adage "a picture is worth a thousand words", when using spoken language to communicate visual information, brevity can be a challenge. This work describes a novel technique for leveraging machine-learned feature embeddings to sonify visual (and other types of) information into a perceptual audio domain, allowing users to perceive this information using only their aural faculty. The system uses a pretrained image embedding network to extract visual features and embed them in a compact subset of Euclidean space -- this converts the images into feature vectors whose $L^2$ distances can be used as a meaningful measure of similarity. A generative adversarial network (GAN) is then used to find a distance preserving map from this metric space of feature vectors into the metric space defined by a target audio dataset equipped with either the Euclidean metric or a mel-frequency cepstrum-based psychoacoustic distance metric. We demonstrate this technique by sonifying images of faces into human speech-like audio. For both target audio metrics, the GAN successfully found a metric preserving mapping, and in human subject tests, users were able to accurately classify audio sonifications of faces.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源