A chatbot can be configured to say only the right things. What an adversary can make it reveal is a different question entirely.

Background
The retailer is one of Southeast Asia’s most recognised grocery and essential goods retailers, with a mission centred on keeping everyday necessities accessible and affordable. As customer expectations have shifted, the organisation has progressively adopted digital tools and AI to improve service quality while preserving reliability and customer confidence.
The organisation developed a customer-facing AI chatbot designed to answer customer queries at scale.
The chatbot was built with a clear objective: to provide accurate, reliable, and brand-aligned responses, drawing exclusively from pre-approved company FAQs. It is built on a Retrieval Augmented Generation (RAG) architecture and supports multi-turn conversations.
The chatbot was selected for the Global AI Assurance Sandbox, powered by IMDA and AI Verify Foundation. Vulcan was engaged as the independent safety testing partner.
The Challenge
As a frontline interface with customers, the chatbot directly influences customer trust and brand perception, consistency of information across channels, and operational efficiency and cost savings. Because it interacts directly with the public, any failure, whether inaccurate responses, inappropriate outputs, or exposure of restricted information, can have immediate and visible consequences.
A key consideration underpinning this engagement is that Generative AI introduces fundamentally different risks compared to traditional systems. Unlike conventional software, GenAI systems can be influenced through language, making them susceptible to manipulation in ways that are often subtle and difficult to predict.
Vulcan’s Approach
Based on the risk assessment, four areas were prioritised for testing:
- System prompt leakage: adversarial prompts that extract the chatbot’s hidden instructions, exposing its internal logic and safeguards
- Data leakage: the risk of exposing sensitive or restricted information to members of the public
- Multi-turn persona manipulation: sustained interactions where users gradually influence the chatbot’s responses or behaviour over time.
- Undesirable content: misleading or inaccurate responses that could misrepresent the organisation’s products, policies, or competitors
Test cases were derived from Vulcan’s threat library and customised to reflect realistic retail customer interactions, reviewed by Vulcan’s GenAI security experts.
50 threat scenarios x 29 attack techniques = 1,450 unique test cases.
The approach combined automated testing using prompt injection and jailbreak techniques, with targeted manual multi-turn testing to surface vulnerabilities that only emerge in extended conversations. Findings were categorised by severity and business impact rather than raw attack success rates. After initial testing, runtime guardrails were implemented and the same attacks were re-run to evaluate mitigation effectiveness.
What Vulcan Found
- System information was disclosed through adversarial prompting. The chatbot could be prompted to reveal its internal operational procedures, exposing the logic and safeguards governing its behaviour. This kind of disclosure lets attackers map system boundaries and mount more targeted follow-on attacks. After applying runtime defences at the application level, the finding was successfully mitigated in re-testing.
- RAG chunking strategy was exposed. The model could be prompted to describe its retrieval architecture in detail, including document splitting methods, chunk size, and overlap. While not directly harmful, architectural disclosure lowers the effort required for targeted attacks. This was also mitigated after runtime defences were applied.
- Single-turn testing underestimates real-world attack complexity. Several vulnerabilities only surfaced through multi-turn interactions, where the chatbot was gradually steered over the course of a conversation. This is a structural limitation of testing approaches that focus on isolated prompts, and underscores the value of sustained, sequential attack simulations.
- Runtime guardrails blocked the majority of attacks in re-testing. After implementing runtime defence mechanisms, re-testing showed that the same attack techniques that succeeded in initial testing were effectively mitigated by the protection layer.
Insights from the Retailer
For the organisation, the engagement was as much about building understanding as it was about finding vulnerabilities. Three observations stood out:
- Recognise that GenAI attacks look different in practice: Reviewing the findings was, in many cases, an eye-opener in terms of how GenAI systems can be attacked in practice. Having vulnerabilities ranked by severity and business impact, rather than raw attack success rates, made it easier to prioritise remediation and communicate risk to internal stakeholders.
- Validate your guardrail investment before relying on it: The organisation intentionally began the exercise with minimal defences in place, relying primarily on system prompt-based guardrails. When runtime mechanisms were implemented and the same attacks were re-run, the results validated both the design decision and the business case for investing in a dedicated guardrail layer.
- Know the triggers that require re-testing: Point-in-time testing is insufficient. Regular, repeatable red-teaming is necessary, particularly when application logic changes, models are updated or replaced, or guardrails and retrieval mechanisms are modified.
Conclusion
For the retailer, this engagement was not just about finding vulnerabilities. It was about building the confidence to deploy AI responsibly at scale. The re-test results validated that the runtime defence layer was working as intended, and gave the organisation a concrete foundation for treating red-teaming as an ongoing practice rather than a one-off exercise.
Read the full case study on the AI Verify Foundation case studies page or download the report directly here.
To learn how Vulcan approaches GenAI security testing, visit https://vulcanlab.ai/vulcan-attack/ or contact us at contact@vulcanlab.ai.
Disclaimer: This case study was produced as part of the Global AI Assurance Sandbox, powered by IMDA and AI Verify Foundation. The deployer has chosen to remain anonymous. Details are drawn from the published case study report.