Models with fewer parameters are necessary for the neural control of memory-limited, performant robots. Finding these smaller neural network architectures can be time-consuming. We propose HyperPPO, an on-policy reinforcement learning algorithm that utilizes graph hypernetworks to estimate the weights of multiple neural architectures simultaneously. Our method estimates weights for networks that are much smaller than those in common-use networks yet encode highly performant policies. We obtain multiple trained policies at the same time while maintaining sample efficiency and provide the user the choice of picking a network architecture that satisfies their computational constraints. We show that our method scales well - more training resources produce faster convergence to higher-performing architectures. We demonstrate that the neural policies estimated by HyperPPO are capable of decentralized control of a Crazyflie2.1 quadrotor. Website: https://sites.google.com/usc.edu/hyperppo

为实现记忆受限、高性能机器人的神经控制，需要具有较少参数的模型。本研究提出了一种基于图形超网络的在线策略强化学习算法HyperPPO，能够同时估计多个较小神经网络架构的权重，并获得高性能的策略。我们的方法能够在保持采样效率的同时，为用户提供选择适合计算约束的网络架构。实验证明，我们方法的扩展性较好，更多的训练资源能够更快地收敛到性能更高的架构。我们还展示了HyperPPO估计的神经策略能够进行Crazyflie2.1四旋翼飞行器的分散控制。

HyperPPO：一种用于机器人控制的寻找小策略的可扩展方法