[1]

C. Thatch and L. Bramwell, “Cross-Modal Vision Representation Learning for Real-World Visual Understanding”, JCTS, vol. 4, no. 4, Apr. 2025.