Red teaming identifies risks in SK Telecom’s A.X 4.0 and Korea Telecom’s Mi:dm 2.0 models, highlighting urgent need for local-first safety strategies.
SINGAPORE – Vulcan, a leader in AI security research, today released a whitepaper highlighting vulnerabilities in South Korea’s newest large language models (LLMs), SK Telecom’s A.X 4.0 and Korea Telecom’s Mi:dm 2.0. The research, benchmarked against OpenAI’s GPT-4.1, indicates significant susceptibility to advanced adversarial prompting techniques, especially when exploiting linguistic and cultural contexts unique to South Korea.
The whitepaper, titled “Securing Korean LLMs: Multilingual Adversarial Red Teaming of SKT A.X 4.0 and KT Mi:dm 2.0,” analyzed over 1,000 adversarial prompts per language (Korean, English, Chinese), covering more than 20 threat categories. Results reveal that while KT’s Mi:dm 2.0 generally outperforms SKT’s A.X 4.0, both remain notably vulnerable in their native Korean language. Both models exhibited substantial weaknesses to culturally-specific adversarial prompts, such as biases concerning physical appearance, gender identity & sexual orientation, politics, and harms related to digital crimes and CBRNE threats.
Key findings include:
- Both SKT A.X 4.0 and KT Mi:dm 2.0 show double-digit vulnerability rates to adversarial attacks, particularly in Korean-language contexts.
- Attack techniques such as payload splitting, role-play, separators, and sentence building are highly effective, particularly in Korean.
- Persistent weaknesses highlight the urgent necessity of developing localized training and adversarial testing to better secure Korean-developed AI.
The whitepaper emphasizes the critical importance of “local-first” alignment in AI security, urging South Korean enterprises, regulators, and society to adopt advanced strategies to ensure responsible and secure AI deployment.
“These results indicate that even state-of-the-art local language models face substantial threats in culturally and linguistically nuanced environments,” said Alex Leung, co-founder of AIFT. “For Korea’s ambitious AI ecosystem to safely thrive, AI models and applications must undergo continuous, rigorous adversarial testing tailored specifically to local conditions and contexts.”
Vulcan recommends immediate adoption of targeted adversarial training, enhanced bias and safety filters, and continuous integration of local user feedback to mitigate risks posed by advanced generative AI systems.
The full whitepaper is available for download here..