Text generation in image-based platforms, particularly for music-related content, requires precise control over text styles and the incorporation of emotional expression. However, existing approaches often need help to control the proportion of external factors in generated text and rely on discrete inputs, lacking continuous control conditions for desired text generation. This study proposes Continuous Parameterization for Controlled Text Generation (CPCTG) to overcome these limitations. Our approach leverages a Language Model (LM) as a style learner, integrating Semantic Cohesion (SC) and Emotional Expression Proportion (EEP) considerations. By enhancing the reward method and manipulating the CPCTG level, our experiments on playlist description and music topic generation tasks demonstrate significant improvements in ROUGE scores, indicating enhanced relevance and coherence in the generated text.

图像平台中的文本生成，特别是与音乐相关的内容，需要对文本样式进行精确控制，同时融入情感表达。但是，现有方法通常需要外部因素在生成的文本中的比例的帮助，并且依赖于离散的输入，缺乏对所需文本生成进行连续控制的条件。本研究提出了一种用于受控文本生成的连续参数化方法（CPCTG），以克服这些限制。我们的方法利用语言模型（LM）作为样式学习器，整合语义凝聚力（SC）和情感表达比例（EEP）考虑。通过改进奖励方法和操纵CPCTG级别，我们在播放列表描述和音乐主题生成任务上进行的实验证明了ROUGE得分的显著提高，表明生成文本的相关性和连贯性得到了增强。

改善基于图像的歌单描述与音乐主题中的情感表达和连贯性：一种连续参数化方法