Springe direkt zu Inhalt

Interpretable Visual Understanding with Cognitive Attention Network

Xuejiao Tang, Wenbin Zhang, Yi Yu, Kea Turner, Tyler Derr, Mengyu Wang, Eirini Ntoutsi – 2021

While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commensense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse Information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiment on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at Https://github.com/tanjatang/CAN.

Title
Interpretable Visual Understanding with Cognitive Attention Network
Author
Xuejiao Tang, Wenbin Zhang, Yi Yu, Kea Turner, Tyler Derr, Mengyu Wang, Eirini Ntoutsi
Publisher
Springer International Publishing
Date
2021-09
Identifier
Print ISBN: 978-3-030-86361-6; Electronic ISBN: 978-3-030-86326-3
Source(s)
Appeared in
Proceedings of the 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14-17, 2021.
Language
eng
Rights
Copyright by Springer. When citing this work, cite the Springer-link