Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/25864
Full metadata record
DC FieldValueLanguage
dc.contributor.authorHolmes, I-
dc.contributor.authorCribbin, T-
dc.contributor.authorFerenczi, N-
dc.date.accessioned2023-01-24T12:23:22Z-
dc.date.available2023-01-24T12:23:22Z-
dc.date.issued2023-01-13-
dc.identifierORCID iDs: Isabel Holmes https://orcid.org/0000-0003-0862-0901; Timothy Cribbin https://orcid.org/0000-0001-9737-4727; Nelli Ferenczi https://orcid.org/0000-0002-3757-6244.-
dc.identifier100267-
dc.identifier.citationHolmes, I. Cribbin, T. and Ferenczi, N. (2023) 'Style over substance: A psychologically informed approach to feature selection and generalisability for author classification', Computers in Human Behavior Reports, 9, 100267, pp. 1 - 13. doi: 10.1016/j.chbr.2022.100267.en_US
dc.identifier.urihttps://bura.brunel.ac.uk/handle/2438/25864-
dc.descriptionData availability: Data will be made available on request.en_US
dc.description.abstractCopyright © 2023 The Authors. Author profiling, or classifying user generated content based on demographic or other personal attributes, is a key task in social media-based research. Whilst high-accuracy has been achieved on many attributes, most studies tend to train and test models on a single domain only, ignoring cross-domain performance and research shows that models often transfer poorly into new domains as they tend to depend heavily on topic-specific (i.e., lexical) features. Knowledge specific to the field (e.g., Psychology, Political Science) is often ignored, with a reliance on data driven algorithms for feature development and selection. Focusing on political affiliation, we evaluate an approach that selects stylistic features according to known psychological correlates (personality traits) of this attribute. Training data was collected from Reddit posts made by regular users of the political subreddits of r/republican and r/democrat. A second, non-political dataset, was created by collecting posts by the same users but in different subreddits. Our results show that introducing domain specific knowledge in the form of psychologically informed stylistic features resulted in better out of training domain performance than lexical or more commonly used stylistic features.en_US
dc.format.extent1 - 13-
dc.format.mediumElectronic-
dc.languageEnglish-
dc.language.isoenen_US
dc.publisherElsevieren_US
dc.rightsCopyright © 2023 The Authors. Published by Elsevier Ltd. under a Creative Commons license (https://creativecommons.org/licenses/by-nc-nd/4.0/).-
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/-
dc.subjectauthor profilingen_US
dc.subjectpolitical affiliation classificationen_US
dc.subjectstylistic feature setsen_US
dc.subjectmodel generalisabilityen_US
dc.subjectpolitical psychologyen_US
dc.subjectfeature developmenten_US
dc.subjectinterdisciplinarityen_US
dc.subjectdomain-specific knowledgeen_US
dc.titleStyle over substance: A psychologically informed approach to feature selection and generalisability for author classificationen_US
dc.typeArticleen_US
dc.identifier.doihttps://doi.org/10.1016/j.chbr.2022.100267-
dc.relation.isPartOfComputers in Human Behavior Reports-
pubs.issuein press pre-proof-
pubs.publication-statusPublished-
pubs.volume9-
dc.identifier.eissn2451-9588-
dc.rights.holderThe Authors-
Appears in Collections:Dept of Computer Science Research Papers
Dept of Life Sciences Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright © 2023 The Authors. Published by Elsevier Ltd. under a Creative Commons license (https://creativecommons.org/licenses/by-nc-nd/4.0/).2.02 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons