使用增强学习朝着天空自适应的光学控制

论文标题

使用增强学习朝着天空自适应的光学控制

Towards on-sky adaptive optics control using reinforcement learning

论文作者

Nousiainen, J., Rajani, C., Kasper, M., Helin, T., Haffert, S. Y., Vérinaud, C., Males, J. R., Van Gorkom, K., Close, L. M., Long, J. D., Hedglen, A. D., Guyon, O., Schatz, L., Kautz, M., Lumbres, J., Rodack, A., Knight, J. M., Miller, K.

论文摘要

潜在可居住的系外行星的直接成像是下一代高对比度成像仪器上的一个主要科学案例。为了实现这一苛刻的科学目标，这些仪器配备了极端的自适应光学（XAO）系统，该系统将在Kilohertz的框架上控制成千上万的执行器到几千霍兹。大多数宜居外行星都位于与宿主恒星相距的小角度分离，当前的XAO系统控制定律留下了强大的残差。电流的AO控制策略，例如基于静态矩阵的波前重建和积分器的控制因时间延迟误差而受到敏感性，并且对误导率敏感，对控制系统的模式变化，即动态变异。我们旨在产生应对这些局限性的控制方法，提供明显改善的AO校正，从而减少冠状动脉点扩散功能中的残余通量。我们扩展了以前的AO强化学习工作。改进的方法称为PO4AO，学习了动态模型并优化了称为策略的控制神经网络。我们介绍了该方法，并通过对8米和40 m望远镜孔径的XAO进行数值模拟，并通过金字塔波前传感。我们进一步实施了PO4AO，并使用MAGAO-X在管家实验室中进行了实验。 PO4AO通过在DM和PYRAMID WFS控制区域内，在模拟和实验室中，在数值模拟中改善数值模拟中的冠状动脉对比度，提供了所需的性能。提出的方法也很快进行训练，即通常在5-10秒的时间尺度上，并且推理时间足够小（<ms），可用于实时控制XAO的实时控制，即使对于极大的望远镜，也可以使用当前可用的硬件。

The direct imaging of potentially habitable Exoplanets is one prime science case for the next generation of high contrast imaging instruments on ground-based extremely large telescopes. To reach this demanding science goal, the instruments are equipped with eXtreme Adaptive Optics (XAO) systems which will control thousands of actuators at a framerate of kilohertz to several kilohertz. Most of the habitable exoplanets are located at small angular separations from their host stars, where the current XAO systems' control laws leave strong residuals.Current AO control strategies like static matrix-based wavefront reconstruction and integrator control suffer from temporal delay error and are sensitive to mis-registration, i.e., to dynamic variations of the control system geometry. We aim to produce control methods that cope with these limitations, provide a significantly improved AO correction and, therefore, reduce the residual flux in the coronagraphic point spread function. We extend previous work in Reinforcement Learning for AO. The improved method, called PO4AO, learns a dynamics model and optimizes a control neural network, called a policy. We introduce the method and study it through numerical simulations of XAO with Pyramid wavefront sensing for the 8-m and 40-m telescope aperture cases. We further implemented PO4AO and carried out experiments in a laboratory environment using MagAO-X at the Steward laboratory. PO4AO provides the desired performance by improving the coronagraphic contrast in numerical simulations by factors 3-5 within the control region of DM and Pyramid WFS, in simulation and in the laboratory. The presented method is also quick to train, i.e., on timescales of typically 5-10 seconds, and the inference time is sufficiently small (< ms) to be used in real-time control for XAO with currently available hardware even for extremely large telescopes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题