Function-as-a-Service (FaaS) introduces a lightweight, function-based cloud
execution model that finds its relevance in applications like IoT-edge data
processing and anomaly detection. While CSP offer a near-infinite function
elasticity, these applications often experience fluctuating workloads and
stricter performance constraints. A typical CSP strategy is to empirically
determine and adjust desired function instances, "autoscaling", based on
monitoring-based thresholds such as CPU or memory, to cope with demand and
performance. However, threshold configuration either requires expert knowledge,
historical data or a complete view of environment, making autoscaling a
performance bottleneck lacking an adaptable solution.RL algorithms are proven
to be beneficial in analysing complex cloud environments and result in an
adaptable policy that maximizes the expected objectives. Most realistic cloud
environments usually involve operational interference and have limited
visibility, making them partially observable. A general solution to tackle
observability in highly dynamic settings is to integrate Recurrent units with
model-free RL algorithms and model a decision process as a POMDP. Therefore, in
this paper, we investigate a model-free Recurrent RL agent for function
autoscaling and compare it against the model-free Proximal Policy Optimisation
(PPO) algorithm. We explore the integration of a LSTM network with the
state-of-the-art PPO algorithm to find that under our experimental and
evaluation settings, recurrent policies were able to capture the environment
parameters and show promising results for function autoscaling. We further
compare a PPO-based autoscaling agent with commercially used threshold-based
function autoscaling and posit that a LSTM-based autoscaling agent is able to
improve throughput by 18%, function execution by 13% and account for 8.4% more
function instances.

通过将模型无关的循环强化学习（Recurrent RL）代理与最先进的 PPO 算法相结合，我们研究了用于函数自动缩放的模型无关的 Recurrent RL 代理，并将其与基于阈值的函数自动缩放进行了比较，发现循环策略能够捕捉环境参数并在函数自动缩放方面显示出有希望的结果。除此之外，我们将基于 PPO 的自动缩放代理与商业使用的基于阈值的函数自动缩放进行了比较，并认为基于 LSTM 的自动缩放代理能够提高吞吐量 18％，函数执行速度 13％，并支持多出 8.4％的函数实例。

一种智能无服务器函数的深度递归强化学习方法

A Deep Recurrent-Reinforcement Learning Method for Intelligent  AutoScaling of Serverless Functions

We propose a recurrent RL agent with an episodic exploration mechanism that
helps discovering good policies in text-based game environments. We show
promising results on a set of generated text-based games of varying difficulty
where the goal is to collect a coin located at the end of a chain of rooms. In
contrast to previous text-based RL approaches, we observe that our agent learns
policies that generalize to unseen games of greater difficulty.

本论文提出了一种带有片段式探索机制的循环强化学习代理，在文本游戏环境中发现良好策略。我们在一系列生成的文本游戏中展示了有希望的结果，游戏难度各异，目标是在一系列房间的末尾收集硬币。与以往的文本强化学习方法相比，我们发现我们的代理学习到可以泛化到更难的未见过游戏的策略。