Large Language Models (LLMs) have become very popular and have found use
cases in many domains, such as chatbots, auto-task completion agents, and much
more. However, LLMs are vulnerable to different types of attacks, such as
jailbreaking, prompt injection attacks, and privacy leakage attacks.
Foundational LLMs undergo adversarial and alignment training to learn not to
generate malicious and toxic content. For specialized use cases, these
foundational LLMs are subjected to fine-tuning or quantization for better
performance and efficiency. We examine the impact of downstream tasks such as
fine-tuning and quantization on LLM vulnerability. We test foundation models
like Mistral, Llama, MosaicML, and their fine-tuned versions. Our research
shows that fine-tuning and quantization reduces jailbreak resistance
significantly, leading to increased LLM vulnerabilities. Finally, we
demonstrate the utility of external guardrails in reducing LLM vulnerabilities.

大型语言模型在各个领域中得到了广泛应用，但是它们也面临不同类型的攻击，如越狱、提示注入和隐私泄露攻击。本研究探讨了下游任务（如改进性调整和量化）对大型语言模型的脆弱性的影响，并展示了使用外部防护措施以减少脆弱性的实用性。