Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/28326
Full metadata record
DC FieldValueLanguage
dc.contributor.authorJiang, R-
dc.contributor.authorLiang, L.-
dc.contributor.authorYu, K-
dc.date.accessioned2024-02-16T10:53:56Z-
dc.date.available2024-02-16T10:53:56Z-
dc.date.issued2024-02-23-
dc.identifierORCiD: Keming Yu https://orcid.org/0000-0001-6341-8402-
dc.identifier.citationJiang, R. Liang, L. and Yu, K. (2024) 'Renewable Huber estimation method for streaming datasets', Electronic Journal of Statistics, 0 (accepted, in press), pp. 674 - 705. doi: 10.1214/24-EJS2223.en_US
dc.identifier.urihttps://bura.brunel.ac.uk/handle/2438/28326-
dc.descriptionMSC2020 subject classifications: Primary 60G08; secondary 62G20.en_US
dc.description.abstractStreaming data refers to a data collection scheme where observations arrive sequentially and perpetually over time, making it challenging to fit into computer memory for statistical analysis. The ordinary least squares estimate for linear regression is sensitive to heavy-tailed errors and outliers, which are commonly encountered in applications. In this case, the Huber loss function is a useful criterion for robust regression. In this paper, we propose robust regression estimation and variable selection for streaming datasets. Unlike the renewable estimation generalized linear regression for streaming datasets, however, the Huber loss function is only first-order differentiable, which poses challenges to renewable estimation in both computation and theoretical development. To address the challenge, we introduce a new smoothed version of the Huber first derivative, which admits a fast and scalable algorithm to perform optimization for streaming data sets and achieves the best fitting of Huber function among different versions. Theoretically, the proposed statistics are shown to have the same asymptotic properties as the standard version computed on an entire data stream with the data batches pooled into one data set, without additional condition. The proposed methods are illustrated using current data and the summary statistics of historical data. Both simulations and real data analysis are conducted to illustrate the finite sample performance of the proposed methods.en_US
dc.description.sponsorshipMinistry of Education of the People’s Republic of China, Humanities and Social Science Foundation (No. 22YJC910005); National Social Science Foundation of China (No. 21BTJ040).en_US
dc.format.extent674 - 705-
dc.format.mediumElectronic-
dc.language.isoen_USen_US
dc.publisherInstitute of Mathematical Statistics on behalf of the Bernoulli Society for Mathematical Statistics and Probabilityen_US
dc.rightsRights: Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).-
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/-
dc.subjecthigh-dimensional estimationen_US
dc.subjectHuber lossen_US
dc.subjectonline updatingen_US
dc.subjectstreaming dataen_US
dc.titleRenewable Huber estimation method for streaming datasetsen_US
dc.typeArticleen_US
dc.identifier.doihttps://doi.org/10.1214/24-EJS2223-
dc.relation.isPartOfElectronic Journal of Statistics-
pubs.issue1-
pubs.publication-statusPublished-
pubs.volume18-
dc.identifier.eissn1935-7524-
dc.rights.licensehttps://creativecommons.org/licenses/by/4.0/legalcode.en-
Appears in Collections:Dept of Mathematics Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdf427.97 kBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons