DriftScope — ECCV 2026

Abstract

Adapting pre-trained text-to-image diffusion models, whether to learn new visual concepts or erase unwanted ones, is routinely evaluated on its intended effects alone. We argue this framing is incomplete. Through sparse autoencoder analysis and zero-shot classification, we demonstrate that adaptation systematically damages semantically unrelated concepts in ways that aggregate metrics structurally cannot surface: when damage is severe enough for FID and KID to respond, the model is already nearly unusable; when the model remains functional, FID and KID stay flat while specific classes silently suffer worst-case zero-shot accuracy drops of up to 18.9 points and concept-level distributions shift dramatically. This pattern appears at both ends of the adaptation spectrum (concept customization and concept unlearning), suggesting it is a systematic consequence of weight-level modification rather than an artifact of any particular method. To surface this hidden drift before deployment, we introduce DriftScope, a prompt-level diagnostic tool that takes any two model checkpoints and returns a ranked list of tokens whose visual concepts have shifted most between them. DriftScope optimizes a soft prompt to attribute drift at the token level without requiring access to real data or model internals. The result is an interpretable, concept-level audit that aggregate evaluation cannot provide.

Key Findings

Three regimes of adaptation damage

Catastrophic

Detectable but unusable

−91.9

worst-case accuracy drop (CIFAR-10)

EraseDiff: FID and KID spike, zero-shot collapses. Metrics catch it — but the model is already destroyed.

Insidious

Passes inspection silently

−18.9

accuracy drop with FID near baseline

SPM: FID stays at 3.6, KID at 0.00002 — yet specific concepts are quietly eroded. Aggregate metrics see nothing.

Subtle

Hidden in the variance

±12.2

std across concepts (mean ≈ 0)

DreamBooth: mean accuracy is fine, but the standard deviation reveals some classes are severely damaged while others are untouched.

Evidence

Concept-level distributional drift

Fig. 1. Distribution of SAE drift scores across concepts. The baseline (top-left) concentrates sharply around 0.5; adapted models show heavy-tailed distributions indicating systematic concept-level drift beyond the intended target. Top row: concept unlearning (nudity). Bottom row: concept customization (dog).

DriftScope output

Drift is specific to each adaptation

Two case studies from the paper. Select an adaptation to see the tokens DriftScope ranked as most affected — often collateral damage no practitioner would anticipate.

Unlearning · SPM · SD 1.4

Customization · DreamBooth · dog

Unlearning Garbage → top drifted concepts

Results

High-drift vs. low-drift concepts

(a) Maximized drift.

(b) Minimized drift.

Fig. 3. Prompts found by DriftScope that (a) maximize and (b) minimize cross-attention drift between model checkpoints. Results for unlearning methods (ESD, SPM, Scissorhands) and DreamBooth LoRA customization (dog concept) across Stable Diffusion 1.5, 2.1, and 3.5-Medium.

Generalization

Long and domain-specific prompts

DriftScope is not limited to short, single-concept prompts. Collateral damage surfaces equally in longer, naturalistic inputs: in the DreamBooth dog fine-tune, Facebook emerges as a high-drift token despite having no relation to the adapted concept, and color-bearing prompts show stylistic drift propagating to modifiers the practitioner would never expect to be affected.

DriftScope generalizes to long, naturalistic prompts

Fig. 4. High-drift image pairs from SD 2.1 under long, naturalistic inputs sourced from PromptHero, for a DreamBooth model fine-tuned on a dog concept. Each pair shows the base model generation (left) alongside the adapted model generation (right).

Citation

BibTeX

@article{laria2026driftscope, title={DriftScope: Measuring The Hidden Effects of Diffusion Model Adaptation}, author={Laria, H{\'e}ctor and Han, Yiping and Santamaria, Julian D. and Wang, Kai and Raducanu, Bogdan and Van De Weijer, Joost and Gomez-Villa, Alexandra}, journal={arXiv preprint arXiv:XXXX.XXXXX}, year={2026} }