DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense
Published in ICLR, 2026
Core Insight
Most defenses fail not because gradients are unavailable, but because they are too consistent.
DRIFT identifies gradient consensus across transformations as the key driver of adversarial transferability, and shows that robustness requires enforcing divergence—not randomness.
Links
- Paper: OpenReview
Motivation
Transformation-based defenses attempt to improve robustness by introducing randomness (e.g., resizing, noise, filtering).
However, attackers can still succeed using EOT or BPDA.
Why?

Because:
- transformations change inputs
- but preserve gradient structure
The gradients remain aligned → attacks remain transferable.
Why Do Transformation Defenses Fail?

Across different transformations:
- gradients exhibit high alignment
- attackers exploit this to construct robust perturbations
This reveals a key principle:
Robustness is not about randomness — it is about breaking gradient consensus.
Method Overview


DRIFT is a stochastic, differentiable filter ensemble that enforces gradient divergence.
Instead of relying on non-differentiability or masking, DRIFT:
- applies learnable filtered transformations
- enforces divergent responses across filters
- preserves prediction consistency on clean data
The training objective combines:
- prediction consistency
- Jacobian divergence
- logit-space divergence
- adversarial robustness
This leads to structured gradient disalignment, not noise.
Abstract
Deep neural networks remain highly vulnerable to adversarial examples, particularly when gradients can be reliably estimated. We identify gradient consensus—the tendency of randomized transformations to produce aligned gradients—as a key mechanism enabling adversarial transferability.
We propose DRIFT (Divergent Response in Filtered Transformations), a stochastic, differentiable defense framework that enforces gradient divergence across transformation pathways. Unlike prior randomized defenses that rely on gradient masking, DRIFT introduces a learnable filter ensemble trained to maximize divergence in Jacobian and logit responses while preserving clean predictions.
We formalize gradient consensus and theoretically link it to transferability, and propose a consensus-divergence training strategy that combines prediction consistency, Jacobian separation, logit-space separation, and adversarial training. Experiments on ImageNet-scale models, including CNNs and Vision Transformers, show that DRIFT achieves strong robustness against adaptive white-box attacks (BPDA, EOT), transfer-based attacks, and gradient-free attacks, outperforming state-of-the-art transformation-based and stochastic defenses.
These results demonstrate that enforcing gradient divergence—not randomness—is key to robust adversarial defense.
Key Contributions
- Introduces gradient consensus as a fundamental mechanism underlying adversarial transferability
- Provides theoretical analysis linking gradient alignment → transferability
- Proposes DRIFT, a differentiable filter-ensemble defense enforcing gradient divergence
- Demonstrates strong robustness against adaptive white-box attacks (BPDA, EOT)
- Achieves state-of-the-art performance across CNNs and Vision Transformers on ImageNet
- Maintains low computational overhead, enabling practical deployment
Results

DRIFT outperforms existing defenses under standard white-box attacks.

Maintains robustness under strong adaptive attacks (BPDA + EOT).

Reduced gradient consensus leads to lower transferability.

Lightweight and deployable with minimal overhead.
Why This Matters
DRIFT shifts the perspective on adversarial defense:
The problem is not that gradients exist — it is that they are shared.
By breaking this shared structure:
- transferability is reduced
- adaptive attacks become harder
- robustness generalizes across architectures
This has implications for:
- transformation-based defenses
- stochastic defenses
- real-world deployable AI systems
Broader Perspective
DRIFT is a core component of a broader research direction:
Alignment → Transferability → Vulnerability
Disruption → Divergence → Robustness
It complements:
- TriQDef → breaks cross-bit structural alignment
- TESSER → exploits alignment for transfer attacks
- SSAP / DAP → exploit structure in physical attacks
Citation
@inproceedings{guesmi2026drift,
title={DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense},
author={Guesmi, Amira and Shafique, Muhammad},
booktitle={International Conference on Learning Representations (ICLR)},
year={2026}
}
