SAGA-ReID reconstructs person identity representations using CLIP patch tokens rather than global pooling, improving recognition under occlusion.
arXiv cs.CV · 2026年4月27日
AI要約
•Researchers propose SAGA-ReID, which aligns intermediate patch tokens with anchor vectors in CLIP's text embedding space to suppress corrupted or absent regions without requiring image descriptions.
•The method emphasizes spatially stable evidence and outperforms global pooling, achieving up to +10.6 Rank-1 improvement on occluded benchmarks where global aggregation is least reliable.
•Controlled experiments isolate the aggregation mechanism under synthetic masking (where identity signal is absent) and realistic human distractors (where an overlapping person introduces confusing signal), with SAGA's advantage growing as occlusion increases.