Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/6702
Full metadata record
DC FieldValueLanguage
dc.contributor.authorCribbin, T-
dc.date.accessioned2012-09-21T15:13:22Z-
dc.date.available2012-09-21T15:13:22Z-
dc.date.issued2011-
dc.identifier.citationJournal of the American Society for Information Science and Technology, 62(6): 1188 - 1207, Jun 2011en_US
dc.identifier.issn1532-2882-
dc.identifier.urihttp://onlinelibrary.wiley.com/doi/10.1002/asi.21519/abstracten
dc.identifier.urihttp://bura.brunel.ac.uk/handle/2438/6702-
dc.descriptionThis is the post-print of the Article - Copyright @ 2011 ASIS&Ten_US
dc.description.abstractDocument similarity models are typically derived from a term-document vector space representation by comparing all vector-pairs using some similarity measure. Computing similarity directly from a ‘bag of words’ model can be problematic because term independence causes the relationships between synonymous and related terms and the contextual influences that determine the ‘sense’ of polysemous terms to be ignored. This paper compares two methods that potentially address these problems by modelling the higher-order relationships that lie latent within the original vector space. The first is latent semantic analysis (LSA), a dimension reduction method which is a well known means of addressing the vocabulary mismatch problem in information retrieval systems. The second is the lesser known, yet conceptually simple approach of second-order similarity (SOS) analysis, where similarity is measured in terms of profiles of first-order similarities as computed directly from the term-document space. Nearest neighbour tests show that SOS analysis produces similarity models that are consistently better than both first-order and LSA derived models at resolving both coarse and fine level semantic clusters. SOS analysis has been criticised for its cubic complexity. A second contribution is the novel application of vector truncation to reduce the run-time by a constant factor. Speed-ups of four to ten times are found to be easily achievable without losing the structural benefits associated with SOS analysis.en_US
dc.language.isoenen_US
dc.publisherAmerican Society for Information Science and Technologyen_US
dc.titleDiscovering latent topical structure by second-order similarity analysisen_US
dc.typeArticleen_US
dc.identifier.doihttp://dx.doi.org/10.1002/asi.21519-
pubs.organisational-data/Brunel-
pubs.organisational-data/Brunel/Brunel Active Staff-
pubs.organisational-data/Brunel/Brunel Active Staff/School of Info. Systems, Comp & Maths-
pubs.organisational-data/Brunel/Brunel Active Staff/School of Info. Systems, Comp & Maths/IS and Computing-
pubs.organisational-data/Brunel/University Research Centres and Groups-
pubs.organisational-data/Brunel/University Research Centres and Groups/School of Information Systems, Computing and Mathematics - URCs and Groups-
pubs.organisational-data/Brunel/University Research Centres and Groups/School of Information Systems, Computing and Mathematics - URCs and Groups/Multidisclipary Assessment of Technology Centre for Healthcare (MATCH)-
pubs.organisational-data/Brunel/University Research Centres and Groups/School of Information Systems, Computing and Mathematics - URCs and Groups/People and Interactivity Research Centre-
Appears in Collections:Publications
Computer Science
Dept of Computer Science Research Papers

Files in This Item:
File Description SizeFormat 
Fulltext.pdf1.16 MBAdobe PDFView/Open


Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.