Please use this identifier to cite or link to this item:
http://bura.brunel.ac.uk/handle/2438/28326
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Jiang, R | - |
dc.contributor.author | Liang, L. | - |
dc.contributor.author | Yu, K | - |
dc.date.accessioned | 2024-02-16T10:53:56Z | - |
dc.date.available | 2024-02-16T10:53:56Z | - |
dc.date.issued | 2024-02-23 | - |
dc.identifier | ORCiD: Keming Yu https://orcid.org/0000-0001-6341-8402 | - |
dc.identifier.citation | Jiang, R. Liang, L. and Yu, K. (2024) 'Renewable Huber estimation method for streaming datasets', Electronic Journal of Statistics, 0 (accepted, in press), pp. 674 - 705. doi: 10.1214/24-EJS2223. | en_US |
dc.identifier.uri | https://bura.brunel.ac.uk/handle/2438/28326 | - |
dc.description | MSC2020 subject classifications: Primary 60G08; secondary 62G20. | en_US |
dc.description.abstract | Streaming data refers to a data collection scheme where observations arrive sequentially and perpetually over time, making it challenging to fit into computer memory for statistical analysis. The ordinary least squares estimate for linear regression is sensitive to heavy-tailed errors and outliers, which are commonly encountered in applications. In this case, the Huber loss function is a useful criterion for robust regression. In this paper, we propose robust regression estimation and variable selection for streaming datasets. Unlike the renewable estimation generalized linear regression for streaming datasets, however, the Huber loss function is only first-order differentiable, which poses challenges to renewable estimation in both computation and theoretical development. To address the challenge, we introduce a new smoothed version of the Huber first derivative, which admits a fast and scalable algorithm to perform optimization for streaming data sets and achieves the best fitting of Huber function among different versions. Theoretically, the proposed statistics are shown to have the same asymptotic properties as the standard version computed on an entire data stream with the data batches pooled into one data set, without additional condition. The proposed methods are illustrated using current data and the summary statistics of historical data. Both simulations and real data analysis are conducted to illustrate the finite sample performance of the proposed methods. | en_US |
dc.description.sponsorship | Ministry of Education of the People’s Republic of China, Humanities and Social Science Foundation (No. 22YJC910005); National Social Science Foundation of China (No. 21BTJ040). | en_US |
dc.format.extent | 674 - 705 | - |
dc.format.medium | Electronic | - |
dc.language.iso | en_US | en_US |
dc.publisher | Institute of Mathematical Statistics on behalf of the Bernoulli Society for Mathematical Statistics and Probability | en_US |
dc.rights | Rights: Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). | - |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | - |
dc.subject | high-dimensional estimation | en_US |
dc.subject | Huber loss | en_US |
dc.subject | online updating | en_US |
dc.subject | streaming data | en_US |
dc.title | Renewable Huber estimation method for streaming datasets | en_US |
dc.type | Article | en_US |
dc.identifier.doi | https://doi.org/10.1214/24-EJS2223 | - |
dc.relation.isPartOf | Electronic Journal of Statistics | - |
pubs.issue | 1 | - |
pubs.publication-status | Published | - |
pubs.volume | 18 | - |
dc.identifier.eissn | 1935-7524 | - |
dc.rights.license | https://creativecommons.org/licenses/by/4.0/legalcode.en | - |
Appears in Collections: | Dept of Mathematics Research Papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
FullText.pdf | 427.97 kB | Adobe PDF | View/Open |
This item is licensed under a Creative Commons License