Defending AI against deception attacks

Project Start Date: 30 Dec 2022 Project Finish Date: 30 Dec 2025

Project: Artificial Intelligence relies on deep learning as its central driving tool. However, deep learning is vulnerable to malicious attacks that can manipulate data or embed Trojans in the deep models to gain full control over the AI system. This project aims to secure AI systems for defense applications by comprehensively addressing its vulnerabilities to attacks. The outcomes are likely to enable the detection of malicious attacks on deep learning, identification of the attack sources, estimation of attackers’ capabilities and cleansing of the model and data of the malignant effects.

Chief Investigators: Professor Ajmal Saeed Mian, Dr Naveed Akhtar, Professor Richard Hartley

Research Associate: Dr Jordan Vice
PhD Student: Max Collins

Acknowledgement: This project was funded by National Intelligence and Security Discovery Research Grant (project# NS220100007) funded by the Department of Defence Australia.

News & Milestones:

  • Nov 2025: Our paper “On the fairness, diversity and reliability of text-to-image generative models” is accepted in Artificial Intelligence Review [A* journal, Impact Factor 13.9].
  • Oct 2025: Max Collins gave a talk to the CSSE’s Industry Advisory Panel on “Optical illusions and bias in AI”.
  • Oct 2025: Our paper on “Safety without semantic disruptions: Editing-free safe image generation via context preserving dual latent reconstruction” won the Best Paper Award in the ICCV Workshop on Safe and Trustworthy Multimodal AI Systems (SafeMM-AI).
  • Oct 2025: Andy Lee completed Honours project titled “Multi-candidate reverse diffusion for adversarial defence in speech command recognition”.
  • Oct 2025: Tyler Etherton completed Masters project titled “Flexible trigger inversion for backdoor defence in computer vision models”.
  • Sep 2025: Our student Yunzhuo Chen submitted PhD thesis titled “An investigation into the generation and detection of forged visual content”.
  • Sep 2025: Our challenge on “Poison Sample Detection and Trigger Retrieval in Multimodal VLMs” was held at ICIP 2025 in Anchorage, Alaska, USA.
  • Aug 2025: Uploaded data associated with the following paper on IEEE Dataport [Link].
  • Aug 2025: Uploaded preprint titled “On the Reliability of Vision-Language Models Under Adversarial Frequency-Domain Perturbations” on arXiv (arXiv:2507.22398).
  • Jul 2025: Prof Richard Harley discusses our work on Trojans/backdoors in his invited talk at INSAIT AI lab in Sofia, Balgaria
  • Jul 2025: Ajmal Mian gives an invited talk on “Bias in text to image generative models” at Simon Fraser University, BC, Canada
  • Jun 2025: Prof Richard Hartley discusses our work on Trojans/backdoors in generative AI models in his talk at the Boston Dynamics Robotics AI Institute in Zurich, Switzerland.
  • May 2025: Updated the paper “On the Fairness, Diversity and Reliability of Text-to-Image Generative Models” on arXiv:2411.13981.
  • May 2025: Our paper titled “Quantifying bias in text to image generative models” accepted in IEEE Transactions on Secure and Dependable Computing (TSDC), [A* journal, Impact Factor 7.5].
  • May 2025: Our challenge on “Poison Sample Detection and Trigger Retrieval in Multimodal VLMs” is accepted in ICIP 2025. Visit https://jj-vice.github.io/ to participate in our challenge.
  • May 2025: Uploaded paper titled “Adversarial Boundary Guidance for Natural Adversarial Diffusion” to arXiv https://arxiv.org/abs/2505.20934.
  • Apr 2025: Presented paper titled “Exploring Bias in over 100 Text-to-Image Generative Models” at ICLR Workshop on Open Science for Foundation Models, 2025.
  • Mar 2025: Our paper titled “Dynamic watermarks in images generated by diffusion models” accepted in Responsible Generative AI Workshop, CVPR 2025. https://arxiv.org/abs/2502.08927
  • Feb 2025: Uploaded paper titled “Image Watermarking of Generative Diffusion Models” to arXiv https://arxiv.org/abs/2502.10465.
  • Feb 2025: One Honours and one Masters student join our project.
  • Nov 2024: Prof Mubarak Shah discusses our work on quantifying bias in his keynote at the British Machine Vision Conference in Glasgow, UK.
  • Nov 2024: Uploaded paper titled “Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction” to arXiv preprint arXiv:2411.13982.
  • Nov 2024: Uploaded paper titled “On the Fairness, Diversity and Reliability of Text-to-Image Generative Models” to arXiv preprint arXiv:2411.13981.
  • Oct 2024: Yunzhuo Chen presented “A statistical image realism score for deepfake detection” at the IEEE International Conference on Image Processing (ICIP) in Abu Dhabi, UAE.
  • Sep 2024: Jordan Vice presented “Manipulating and Mitigating Generative Model Biases without Retraining” at ECCV Workshop on critical evaluation of generative models and their impact on society in Milan, Italy.
  • 20 Aug 2024: Ajmal Mian talks about security of AI models at Automation, AI, Science, and Policy workshop in UWA law school – attended by Prof Chennupati Jagadish (President, Australian Academy of Science), Anna-Maria Arabia (CEO Australian Academy of Science), Prof Stephen Powles (WA Chair Australian Academy of Science), Catherine Fletcher (WA Information Commissioner)
  • Aug 2024: Our paper titled “A statistical image realism score for deepfake detection” is accepted in the IEEE International Conference on Image Processing 2024.
  • Jul 2024: Jordan Vice speaks on the topic “Exploring generative AI in academic research: An introduction to opportunities, risks and ethical considerations” at UWA Webinar.
  • Jun 2024: Richard Hartley visits UWA to discuss progress and future directions of the project.
  • May 2024: Ajmal Mian is member of CORE Working Group for the Senate Inquiry on Adopting AI
  • May 2024: Richard Hartley elected Fellow of the Royal Society
  • 18 Apr 2024: Ajmal Mian gives a talk on “Backdoors & bias in text-to-image generative models” at UCF
  • Apr 2024: Our BAGM paper accepted for publication in IEEE Transactions on Information Forensics & Security (TIFS), [A* journal, Impact Factor 8.0].
  • 3 Apr 2024: Uploaded paper titled “Severity Controlled Text-to-Image Generative Model Bias Manipulation”to arXiv https://arxiv.org/abs/2404.02530
  • 27 Mar 2024: Ajmal Mian participates in WA Science & Technology Plan – Advisory Group Meeting
  • 7 Mar 2024: Ajmal Mian participates in WA 10 year science plan discussions
  • 25 Feb 2024: We welcome Max Collins (PhD candidate) to our team.
  • 27 Dec 2023: Jordan Vice received the Hugging Face community GPU grant for the Try-Before-You-Bias.
  • 20 Dec 2023: Videos demonstrating Try-Before-You-Bias uploaded to YouTube as Part 1, Part 2, & Part 3.
  • 20 Dec 2023: Demo (and source code) for quantifying bias in text to image generative models released on Hugging Face (Link). We call it Try-Before-You-Bias.
  • 20 Dec 2023: Uploaded paper titled “Quantifying Bias in Text-to-Image Generative Models” to arXiv:2312.13053.
  • Dec 2023: Prof Richard Hartley talks on “Geometry of Learning, Transformable Image Distributions” at Uni of Queensland (8 Dec), at QUT (12 Dec) and at Griffith Uni (13 Dec) and discusses our work on BAGM.
  • 30 Nov 2023: Ajmal Mian talks about “Deepfake detection with spatio-temporal consistency and attention” at DICTA 2023.
  • 26 Sep 2023: Uploaded paper titled “On quantifying and improving realism of images generated with diffusion” on arXiv. We propose Image Realism Score (IRS), a non-learning based metric for deep fake detection.
  • 18 – 20 Sep 2023: Prof Richard Hartley visits UWA to discuss the project.
  • 12 Aug 2023: Paper (collaboration with UCF) accepted in 4th Workshop on Adversarial Robustness In the Real World (held with ICCV 2023 in Paris)
  • 10 Aug 2023: Ajmal Mian spoke (as a panelist) on “The Rise of AI” at In Conversation organized by WA Museum.
  • 31 Jul 2023: Uploaded paper titled “BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models” by J Vice, N Akhtar, R Hartley, A Mian to arXiv https://arxiv.org/abs/2307.16489.
  • 24 Jul 2023: Advertised Honours/MDS project titled “Backdoor detection in machine learning models”
  • 11 Apr 2023: We welcome Dr Jordan Vice into our team
  • 3-5 Apr 2023: Prof Richard Hartley visits UWA
  • 10 Mar 2023: Welcome talk to Bachelor of Advanced Computer Science students [slides]
  • 27 Jan 2023: Published article in the IAPR Newsletter on backdoor attacks [PDF]
  • Jan 2023: Advertised PhD scholarship for this project
  • 31 Dec 2022: Project formally signed

Data and Resources:

  • Uploaded data associated with preprint titled “On the Reliability of Vision-Language Models Under Adversarial Frequency-Domain Perturbations” to IEEE Dataport [Link].
  • Data for our ICIP 2025 challenge on “Poison Sample Detection and Trigger Retrieval in Multimodal VLMs” is available at https://jj-vice.github.io/.
  • Our ICIP 2024 data https://ieee-dataport.org/documents/gen-100. 3000 images (100 categories x 30 samples) generated by Stable Diffusion Model (SDM), DALLE-2, Midjourney, and BigGAN (using prompts from ChatGPT).
  • Trained 36 Trojan/backdoor inserted models (4 models x 3 layers of backdoors x 3 triggers) plus additional rare trigger models for comparison to an existing techniques.  Code on GitHub repository at https://github.com/JJ-Vice/BAGM and the 36 models are on Hugging Face (LINK)
  • Videos demonstrating how to use Try-Before-You-Bias uploaded to YouTube as Part 1, Part 2, & Part 3. Now code also available on GitHub repository https://github.com/JJ-Vice/TryBeforeYouBias
  • Demo (and source code) for quantifying bias in text to image generative models released on Hugging Face (Link).
  • BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models. Paper on arXiv and related data (backdoor injected models are on Hugging Face). Marketable Foods Dataset here.
  • Odysseus Dataset (Link): Contains 1634 clean models and 1642 models with backdoors (using different triggers). Model architectures include Resent18, VGG19, Densenet, GoogleNet and 4 custom designed architectures. Training is done on CIFAR10, Fashion-MNIST and MNIST datasets.