Unsupervised Domain Adaptation via Class Aggregation for Text Recognition

Published in IEEE Transactions on Circuits and Systems for Video Technology, 2023

Cross-domain text recognition is a very challenging task due to the domain drift problem. One solution is aligning feature distributions between domains through Unsupervised Domain Adaptation (UDA). All existing methods perform feature alignment based on the whole image or semantic character features. However, visual character features without contextual semantics also contain much valuable information, e.g., stroke features of individual characters, which also benefits domain transfer. To this end, we propose a dual intra-Class Aggregation based unsupervised Domain Adaptation method (CADA) for text recognition, which aligns both visual and semantic character feature distributions. To our knowledge, CADA is the first to consider visual character features without contextual semantics in cross-domain text recognition tasks. Accordingly, a Singlehead Self-Attention (SSA) mechanism is introduced for extracting visual character features. Thereafter, a dual intra-class aggregation strategy is designed, which performs class aggregations in both visual and semantic spaces. We test the proposed method on widely-used datasets by combining it with representative text recognition models with various decoding methods. Extensive experimental results demonstrate the superiority and generality. Moreover, there is no additional inference time introduced compared to the baselines.

Recommended citation: X. Liu, X. Ding, X. Luo, and X. Xu. "Unsupervised Domain Adaptation via Class Aggregation for Text Recognition." IEEE Transactions on Circuits and Systems for Video Technology. vol. 33, no. 10, pp. 5617- 5630, 2023.
Download Paper