In data-driven decision-making in marketing, healthcare, and education, it is
desirable to utilize a large amount of data from existing ventures to navigate
high-dimensional feature spaces and address data scarcity in new ventures. We
explore knowledge transfer in dynamic decision-making by concentrating on batch
stationary environments and formally defining task discrepancies through the
lens of Markov decision processes (MDPs). We propose a framework of Transferred
Fitted $Q$-Iteration algorithm with general function approximation, enabling
the direct estimation of the optimal action-state function $Q^*$ using both
target and source data. We establish the relationship between statistical
performance and MDP task discrepancy under sieve approximation, shedding light
on the impact of source and target sample sizes and task discrepancy on the
effectiveness of knowledge transfer. We show that the final learning error of
the $Q^*$ function is significantly improved from the single task rate both
theoretically and empirically.

在数据驱动的决策制定中，通过利用现有企业的大量数据来导航高维特征空间，解决新企业中数据稀缺问题，在动态决策制定中探讨知识转移，并通过马尔可夫决策过程的角度形式定义任务差异，提出了具有通用函数逼近的转移拟合 Q - 迭代算法框架，可直接估计目标和源数据下的最优动作状态函数 Q*，在筛选逼近下，阐明了统计性能与 MDP 任务差异之间的关系，揭示了知识转移的有效性受源样本数量、目标样本数量和任务差异的影响，并从理论和实证上显示，Q * 函数的最终学习误差显著提高于单一任务速率。

基于数据驱动的批量 $Q^*$ 学习中的知识传递

Data-Driven Knowledge Transfer in Batch $Q^*$ Learning

Temporal action localization (TAL) is a fundamental yet challenging task in
video understanding. Existing TAL methods rely on pre-training a video encoder
through action classification supervision. This results in a task discrepancy
problem for the video encoder -- trained for action classification, but used
for TAL. Intuitively, end-to-end model optimization is a good solution.
However, this is not operable for TAL subject to the GPU memory constraints,
due to the prohibitive computational cost in processing long untrimmed videos.
In this paper, we resolve this challenge by introducing a novel low-fidelity
end-to-end (LoFi) video encoder pre-training method. Instead of always using
the full training configurations for TAL learning, we propose to reduce the
mini-batch composition in terms of temporal, spatial or spatio-temporal
resolution so that end-to-end optimization for the video encoder becomes
operable under the memory conditions of a mid-range hardware budget. Crucially,
this enables the gradient to flow backward through the video encoder from a TAL
loss supervision, favourably solving the task discrepancy problem and providing
more effective feature representations. Extensive experiments show that the
proposed LoFi pre-training approach can significantly enhance the performance
of existing TAL methods. Encouragingly, even with a lightweight ResNet18 based
video encoder in a single RGB stream, our method surpasses two-stream ResNet50
based alternatives with expensive optical flow, often by a good margin.

提出了一种新的 LoFi 视频编码器预训练方法，通过减少时空或时空分辨率的 mini-batch 组合来实现对编码器的端到端优化，有助于解决任务不一致性问题并提供更有效的特征表示，从而显著提高了现有 TAL 方法的性能。