Computational agents support humans in many areas of life and are therefore
found in heterogeneous contexts. This means that agents operate in rapidly
changing environments and can be confronted with huge state and action spaces.
In order to perform services and carry out activities in a goal-oriented
manner, agents require prior knowledge and therefore have to develop and pursue
context-dependent policies. The problem is that prescribing policies in advance
is limited and inflexible, especially in dynamically changing environments.
Moreover, the context of an agent determines its choice of actions. Since the
environments in which agents operate can be stochastic and complex in terms of
the number of states and feasible actions, activities are usually modelled in a
simplified way by Markov decision processes so that agents with reinforcement
learning are able to learn policies that help to capture the context and act
accordingly to optimally perform activities. However, training policies for all
possible contexts using reinforcement learning is time-consuming. A requirement
and challenge for agents is to learn strategies quickly and respond immediately
in cross-context environments and applications. In this work, we propose a
novel simulation-based approach that enables a) the representation of
heterogeneous contexts through knowledge graphs and entity embeddings and b)
the context-aware composition of policies on demand by ensembles of agents
running in parallel. The evaluation we performed on the "Virtual Home" dataset
indicates that agents that need to seamlessly switch between different
contexts, can request on-the-fly composed policies that lead to the successful
completion of context-appropriate activities without having to learn these
policies in lengthy training steps and episodes, in contrast to agents that
apply reinforcement learning.

我们提出了一种基于仿真的新方法，通过知识图和实体嵌入来表示异构上下文，并使用并行运行的代理集合对需求进行上下文感知型策略的组合。在 “Virtual Home” 数据集上的评估表明，需要在不同上下文之间无缝切换的代理可以即时请求组合策略，以成功完成适应上下文的活动，而无需通过冗长的训练步骤和场景学习这些策略，与应用强化学习的代理不同。

基于马尔可夫决策过程、实体嵌入和代理集成的上下文感知型代理策略组合

Context-Aware Composition of Agent Policies by Markov Decision Process  Entity Embeddings and Agent Ensembles

We consider a contextual version of multi-armed bandit problem with global
knapsack constraints. In each round, the outcome of pulling an arm is a scalar
reward and a resource consumption vector, both dependent on the context, and
the global knapsack constraints require the total consumption for each resource
to be below some pre-fixed budget. The learning agent competes with an
arbitrary set of context-dependent policies. This problem was introduced by
Badanidiyuru et al. (2014), who gave a computationally inefficient algorithm
with near-optimal regret bounds for it. We give a computationally efficient
algorithm for this problem with slightly better regret bounds, by generalizing
the approach of Agarwal et al. (2014) for the non-constrained version of the
problem. The computational time of our algorithm scales logarithmically in the
size of the policy space. This answers the main open question of Badanidiyuru
et al. (2014). We also extend our results to a variant where there are no
knapsack constraints but the objective is an arbitrary Lipschitz concave
function of the sum of outcome vectors.

研究了具有全局背包限制条件下的上下文多臂赌博问题，提出了一种计算效率更高、后悔更低的算法，复杂度与策略空间的大小成对数关系，并将结果推广到一种没有背包限制但目标是任意 Lipschitz 凹函数的变体。