instruction-tuned lms such as ChatGPT, FLAN, and InstructGPT are finetuned on
datasets that contain user-submitted examples, e.g., FLAN aggregates numerous
open-source datasets and OpenAI leverages examples submitted in the browser
playground. In this work, we show that adversaries can