automated audio captioning is a cross-modal translation task that aims to
generate natural language descriptions for given audio clips. This task has
received increasing attention with the release of freely available datasets in
recent years. The problem has been addressed predominantl