Text style transfer is an important task in controllable language generation.
Supervised approaches have pushed performance improvement on style-oriented
rewriting such as formality conversion. However, challenges remain due to the
scarcity of large-scale parallel data in many domains. While unsupervised
approaches do not rely on annotated sentence pairs for each style, they are
often plagued with instability issues such as mode collapse or quality
degradation. To take advantage of both supervised and unsupervised paradigms
and tackle the challenges, in this work, we propose a semi-supervised framework
for text style transfer. First, the learning process is bootstrapped with
supervision guided by automatically constructed pseudo-parallel pairs using
lexical and semantic-based methods. Then the model learns from unlabeled data
via reinforcement rewards. Specifically, we propose to improve the
sequence-to-sequence policy gradient via stepwise reward optimization,
providing fine-grained learning signals and stabilizing the reinforced learning
process. Experimental results show that the proposed approach achieves
state-of-the-art performance on multiple datasets, and produces effective
generation with as minimal as 10\% of training data.

本研究提出一种使用半监督框架和强化返馈来解决文本风格转移挑战的方法，通过自动构建伪并行对来引导监督学习并通过强化奖励学习未标注数据，提供细粒度的学习信号来稳定增强学习，并取得了多个数据集上最先进的性能。