The problem of audio-to-audio (A2A) style transfer involves replacing the
style features of the source audio with those from the target audio while
preserving the content related attributes of the source audio. In this paper,
we propose an efficient approach, termed as zero-shot emotion style