Model-agnostic Adversarial Attack and Defense
A model-agnostic adversarial attack disrupts vision-language-action models by misaligning visual-text embeddings, while adversarial fine-tuning defends by learning perturbation-invariant representations.
Academic papers and research explorations
A model-agnostic adversarial attack disrupts vision-language-action models by misaligning visual-text embeddings, while adversarial fine-tuning defends by learning perturbation-invariant representations.
Attackers can fully control a VLA-driven robot by appending just ~20 optimized text tokens to a normal instruction—no image manipulation, no model access at deployment.