Instruction-tuned large language models (LLMs) have achieved remarkable
performance across various benchmark tasks. While providing instructions to
LLMs for guiding their generations is user-friendly, assessing their
instruction-following capabilities is still unclarified due to a lack