相干状态接收器的实时校准：通过反复试验学习

论文标题

相干状态接收器的实时校准：通过反复试验学习

Real-time calibration of coherent-state receivers: learning by trial and error

论文作者

Bilkis, M., Rosati, M., Yepes, R. Morral, Calsamiglia, J.

论文摘要

在经典和量子通信中，最佳歧视与当前技术是一个关键问题，其解决方案将使在自由空间和光纤通道中实现有效的接收器。在本文中，我们表明，增强学习（RL）协议允许代理商学习由被动线性光学，光电检查器和经典自适应控制制成的近乎最佳的相干状态接收器。在几项独立的歧视实验中，对每个代理进行了实时的训练和测试，并且对国家的能量，接收器设置的能量或管理实验的量子机械定律不了解。专门基于观察到的光电探测器结果，代理在一组〜3 10^3可能的接收器设置中自适应地选择，如果猜测正确，则在每个实验结束时获得奖励。与量子物理学中RL的先前应用的不同，每次运行中收集的信息本质上是随机的，因此不足以准确评估所选接收器的性能。然而，我们介绍了代理人的家庭：（i）发现一个接收器在〜3 10^2实验后击败最好的高斯接收器；（ii）超过〜10^3实验后最好的高斯接收器的累积奖励；（iii）同时发现一个近乎最佳的接收器，并在〜10^5实验后获得累积奖励。我们的结果表明，RL技术适用于量子接收器的在线控制，并且可以用于长距离通信，而不是潜在的未知渠道。

The optimal discrimination of coherent states of light with current technology is a key problem in classical and quantum communication, whose solution would enable the realization of efficient receivers for long-distance communications in free-space and optical fiber channels. In this article, we show that reinforcement learning (RL) protocols allow an agent to learn near-optimal coherent-state receivers made of passive linear optics, photodetectors and classical adaptive control. Each agent is trained and tested in real time over several runs of independent discrimination experiments and has no knowledge about the energy of the states nor the receiver setup nor the quantum-mechanical laws governing the experiments. Based exclusively on the observed photodetector outcomes, the agent adaptively chooses among a set of ~3 10^3 possible receiver setups, and obtains a reward at the end of each experiment if its guess is correct. At variance with previous applications of RL in quantum physics, the information gathered in each run is intrinsically stochastic and thus insufficient to evaluate exactly the performance of the chosen receiver. Nevertheless, we present families of agents that: (i) discover a receiver beating the best Gaussian receiver after ~3 10^2 experiments; (ii) surpass the cumulative reward of the best Gaussian receiver after ~10^3 experiments; (iii) simultaneously discover a near-optimal receiver and attain its cumulative reward after ~10^5 experiments. Our results show that RL techniques are suitable for on-line control of quantum receivers and can be employed for long-distance communications over potentially unknown channels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题