SiamABC

SiamABC: Improving Accuracy and Generalization for Efficient Visual Tracking

West Virginia University
WACV, 2025

Abstract

Efficient visual trackers overfit to their training distributions and lack generalization abilities, resulting in them performing well on their respective in-distribution (ID) test sets and not as well on out-of-distribution (OOD) sequences, imposing limitations to their deployment in-the-wild under constrained resources. We introduce SiamABC, a highly efficient Siamese tracker that significantly improves tracking performance, even on OOD sequences. SiamABC takes advantage of new architectural designs in the way it bridges the dynamic variability of the target, and of new losses for training. Also, it directly addresses OOD tracking generalization by including a fast backward-free dynamic test-time adaptation method that continuously adapts the model according to the dynamic visual changes of the target. Our extensive experiments suggest that SiamABC shows remarkable performance gains in OOD sets while maintaining accurate performance on the ID benchmarks. SiamABC outperforms MixFormerV2-S by 7.6% on the OOD AVisT benchmark while being 3x faster (100 FPS) on a CPU.

Overall Approach

The Feature Extraction Block uses a readily available backbone to process the frames. The RelationAware Block exploits representational relations among the dual-template and dual-search-region through our losses, where dual-template and dual-search-region representations are obtained via our learnable FMF layer. The Heads Block learns lightweight convolution layers to infer the bounding box and the classification score through standard tracking losses. During inference, the tracker adapts to every instance through our Dynamic Test-Time Adaptation framework.

OOD Comparison

Comparison of our trackers with others on the AVisT dataset on a CPU. We show the success score (AUC) (vertical axis), speed (horizontal axis), and relative number of FLOPs (circles) of the trackers. Our trackers outperform other efficient trackers in terms of both speed and accuracy.

Dynamic Test-Time Adaptation

Comparative study on test-time adaptation (TTA) approaches on AVisT as it involves various extreme distribution shifts with real-world corruptions and ITB as the next most challenging benchmark.

BibTeX

@InProceedings{Zaveri_2025_WACV, author = {Zaveri, Ram and Patel, Shivang and Gu, Yu and Doretto, Gianfranco}, title = {Improving Accuracy and Generalization for Efficient Visual Tracking}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {9450-9460} }

SiamABC: Improving Accuracy and Generalization for Efficient Visual Tracking

SiamABC (S-Tiny) is efficient and resilient against out-of-distribution tracking in adverse visibility conditions (AVisT benchmark).

Abstract

Overall Approach

OOD Comparison

Comparison of our trackers with others on the AVisT dataset on a CPU. We show the success score (AUC) (vertical axis), speed (horizontal axis), and relative number of FLOPs (circles) of the trackers. Our trackers outperform other efficient trackers in terms of both speed and accuracy.

Dynamic Test-Time Adaptation

Comparative study on test-time adaptation (TTA) approaches on AVisT as it involves various extreme distribution shifts with real-world corruptions and ITB as the next most challenging benchmark.

VOT benchmark Comparison

Comparative study on VOT2020 Benchmark.

AVisT, NFS30, UAV123, TrackingNet, GOT-10k, and LaSOT benchmarks

Comparative Study with other SOTA approaches on various benchmarks including AVisT, NFS30, UAV123, TrackingNet, GOT-10k, and LaSOT.

ITB, OTB, TC128, and DTB70 benchmarks

Comparative study on ITB, OTB, TC128, and DTB70 benchmarks in terms of their AUC score.

BibTeX