SiamABC: Improving Accuracy and Generalization for Efficient Visual Tracking

West Virginia University
WACV, 2025
SiamABC

SiamABC (S-Tiny) is efficient and resilient against out-of-distribution tracking in adverse visibility conditions (AVisT benchmark).

Abstract

Efficient visual trackers overfit to their training distributions and lack generalization abilities, resulting in them performing well on their respective in-distribution (ID) test sets and not as well on out-of-distribution (OOD) sequences, imposing limitations to their deployment in-the-wild under constrained resources. We introduce SiamABC, a highly efficient Siamese tracker that significantly improves tracking performance, even on OOD sequences. SiamABC takes advantage of new architectural designs in the way it bridges the dynamic variability of the target, and of new losses for training. Also, it directly addresses OOD tracking generalization by including a fast backward-free dynamic test-time adaptation method that continuously adapts the model according to the dynamic visual changes of the target. Our extensive experiments suggest that SiamABC shows remarkable performance gains in OOD sets while maintaining accurate performance on the ID benchmarks. SiamABC outperforms MixFormerV2-S by 7.6% on the OOD AVisT benchmark while being 3x faster (100 FPS) on a CPU.

Overall Approach

method

The Feature Extraction Block uses a readily available backbone to process the frames. The RelationAware Block exploits representational relations among the dual-template and dual-search-region through our losses, where dual-template and dual-search-region representations are obtained via our learnable FMF layer. The Heads Block learns lightweight convolution layers to infer the bounding box and the classification score through standard tracking losses. During inference, the tracker adapts to every instance through our Dynamic Test-Time Adaptation framework.

OOD Comparison

method

Comparison of our trackers with others on the AVisT dataset on a CPU. We show the success score (AUC) (vertical axis), speed (horizontal axis), and relative number of FLOPs (circles) of the trackers. Our trackers outperform other efficient trackers in terms of both speed and accuracy.

Dynamic Test-Time Adaptation

method

Comparative study on test-time adaptation (TTA) approaches on AVisT as it involves various extreme distribution shifts with real-world corruptions and ITB as the next most challenging benchmark.

VOT benchmark Comparison

method

Comparative study on VOT2020 Benchmark.

AVisT, NFS30, UAV123, TrackingNet, GOT-10k, and LaSOT benchmarks

method

Comparative Study with other SOTA approaches on various benchmarks including AVisT, NFS30, UAV123, TrackingNet, GOT-10k, and LaSOT.

ITB, OTB, TC128, and DTB70 benchmarks

method

Comparative study on ITB, OTB, TC128, and DTB70 benchmarks in terms of their AUC score.

-->

BibTeX

@inproceedings{zaveri2025siamabc,
    title={Improving Accuracy and Generalization for Efficient Visual Tracking},
    author={Zaveri, Ram and Patel, Shivang and Gu, Yu and Doretto, Gianfranco},
    booktitle={Winter Conference on Applications of Computer Vision},
    year={2025},
    organization={IEEE/CVF}
}