Vision-Language Transformer for Interpretable Pathology Visual Question Answering

Naseem, U; Khushi, M; Kim, J

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/27482

Full metadata record

DC Field	Value	Language
dc.contributor.author	Naseem, U	-
dc.contributor.author	Khushi, M	-
dc.contributor.author	Kim, J	-
dc.date.accessioned	2023-10-31T12:53:37Z	-
dc.date.available	2023-10-31T12:53:37Z	-
dc.date.issued	2022-03-31	-
dc.identifier	ORCID iD: Usman Naseem https://orcid.org/0000-0003-0191-7171	-
dc.identifier	ORCID iD: Matloob Khushi https://orcid.org/0000-0001-7792-2327	-
dc.identifier	ORCID iD: Jinman Kim https://orcid.org/0000-0001-5960-1060	-
dc.identifier.citation	Naseem, U., Khushi, M. and Kim, J. (2022) 'Vision-Language Transformer for Interpretable Pathology Visual Question Answering', IEEE Journal of Biomedical and Health Informatics, 27 (4), pp. 1681 - 1690. doi: 10.1109/JBHI.2022.3163751.	en_US
dc.identifier.issn	2168-2194	-
dc.identifier.uri	https://bura.brunel.ac.uk/handle/2438/27482	-
dc.description.abstract	Pathology visual question answering (PathVQA) attempts to answer a medical question posed by pathology images. Despite its great potential in healthcare, it is not widely adopted because it requires interactions on both the image (vision) and question (language) to generate an answer. Existing methods focused on treating vision and language features independently, which were unable to capture the high and low-level interactions that are required for VQA. Further, these methods failed to offer capabilities to interpret the retrieved answers, which are obscure to humans where the models’ interpretability to justify the retrieved answers has remained largely unexplored. Motivated by these limitations, we introduce a vision-language transformer that embeds vision (images) and language (questions) features for an interpretable PathVQA. We present an interpretable tra nsformer-based P ath- VQA (TraP-VQA), where we embed transformers’ encoder layers with vision and language features extracted using pre-trained CNN and domain-specific language model (LM), respectively. A decoder layer is then embedded to upsample the encoded features for the final prediction for PathVQA. Our experiments showed that our TraP-VQA outperformed the state-of-the-art comparative methods with public PathVQA dataset. Our experiments validated the robustness of our model on another medical VQA dataset, and the ablation study demonstrated the capability of our integrated transformer-based vision-language model for PathVQA. Finally, we present the visualization results of both text and images, which explain the reason for a retrieved answer in PathVQA.	en_US
dc.description.sponsorship	ARC (Grant Number: DP200103748).	en_US
dc.format.extent	1681 - 1690	-
dc.format.medium	Print-Electronic	-
dc.language	English	-
dc.language.iso	en_US	en_US
dc.publisher	IEEE	en_US
dc.rights	Copyright © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works (see: https://journals.ieeeauthorcenter.ieee.org/become-an-ieee-journal-author/publishing-ethics/guidelines-and-policies/post-publication-policies/).	-
dc.rights.uri	https://journals.ieeeauthorcenter.ieee.org/become-an-ieee-journal-author/publishing-ethics/guidelines-and-policies/post-publication-policies/	-
dc.subject	pathology images	en_US
dc.subject	interpretability	en_US
dc.subject	visual question answering	en_US
dc.subject	vision-language	en_US
dc.title	Vision-Language Transformer for Interpretable Pathology Visual Question Answering	en_US
dc.type	Article	en_US
dc.identifier.doi	https://doi.org/10.1109/JBHI.2022.3163751	-
dc.relation.isPartOf	IEEE Journal of Biomedical and Health Informatics	-
pubs.issue	4	-
pubs.publication-status	Published	-
pubs.volume	27	-
dc.identifier.eissn	2168-2208	-
dc.rights.holder	IEEE	-
Appears in Collections:	Dept of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works (see: https://journals.ieeeauthorcenter.ieee.org/become-an-ieee-journal-author/publishing-ethics/guidelines-and-policies/post-publication-policies/).	16.94 MB	Adobe PDF	View/Open

Show simple item record