Improving prediction of COVID-19 mortality using machine learning in the Spanish SEMI-COVID-19 registry

Casas Rojo, José Manuel; Sol Ventura, Paula; Anton-Santos, Juan Miguel; Ortiz de Latierro Olivella, Aitor; Arévalo-Lorido, José Carlos; Mauri, Marc; Rubio-Rivas, Manuel; González‑Vega, Rocío; Giner‑Galvañ, Vicente; Otero Perpiñá, Bárbara; Fonseca Aizpuru, Eva; Muiño, Antonio; Del corral beamonte, Esther; Gómez Huelgas, Ricardo; Arnalich‑Fernández, Francisco; Ramos Rincón, José Manuel

Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/11000/39436

Improving prediction of COVID-19 mortality using machine learning in the Spanish SEMI-COVID-19 registry

Ver/Abrir:

Improving prediction of COVID‑19 mortality.pdf

1,4 MB
Adobe PDF
Compartir:

Este recurso está restringido

Título :
Improving prediction of COVID-19 mortality using machine learning in the Spanish SEMI-COVID-19 registry

Autor :
Casas Rojo, José Manuel

Sol Ventura, Paula
Anton-Santos, Juan Miguel

Ortiz de Latierro Olivella, Aitor

Arévalo-Lorido, José Carlos

Mauri, Marc
Rubio-Rivas, Manuel

González‑Vega, Rocío
Giner‑Galvañ, Vicente
Otero Perpiñá, Bárbara
Fonseca Aizpuru, Eva

Muiño, Antonio
Del corral beamonte, Esther

Gómez Huelgas, Ricardo

Arnalich‑Fernández, Francisco
Ramos Rincón, José Manuel

Editor :
Springer

Departamento:
Departamentos de la UMH::Medicina Clínica

Fecha de publicación:
2023

URI :
https://hdl.handle.net/11000/39436

Resumen :
COVID-19 is responsible for high mortality, but robust machine learning-based predictors of mortality are lacking. To generate a model for predicting mortality in patients hospitalized with COVID-19 using Gradient Boosting Decision Trees (GBDT). The Spanish SEMI-COVID-19 registry includes 24,514 pseudo-anonymized cases of patients hospitalized with COVID-19 from 1 February 2020 to 5 December 2021. This registry was used as a GBDT machine learning model, employing the CatBoost and BorutaShap classifier to select the most relevant indicators and generate a mortality prediction model by risk level, ranging from 0 to 1. The model was validated by separating patients according to admission date, using the period 1 February to 31 December 2020 (first and second waves, pre-vaccination period) for training, and 1 January to 30 November 2021 (vaccination period) for the test group. An ensemble of ten models with different random seeds was constructed, separating 80% of the patients for training and 20% from the end of the training period for cross-validation. The area under the receiver operating characteristics curve (AUC) was used as a performance metric. Clinical and laboratory data from 23,983 patients were analyzed. CatBoost mortality prediction models achieved an AUC performance of 84.76 (standard deviation 0.45) for patients in the test group (potentially vaccinated patients not included in model training) using 16 features. The performance of the 16-parameter GBDT model for predicting COVID-19 hospital mortality, although requiring a relatively large number of predictors, shows a high predictive capacity.

Palabras clave/Materias:
COVID-19
Machine learning
Deep learning
Mortality
Spain

Tipo de documento :
info:eu-repo/semantics/article

Derechos de acceso:
info:eu-repo/semantics/closedAccess

DOI :
10.1007/s11739-023-03338-0

Publicado en:
Intern Emerg Med. 2023 Sep;18(6):1711-1722

Aparece en las colecciones:
Artículos Medicina Clínica

Mostrar el registro Dublin Core completo del ítem Ver estadísticas

La licencia se describe como: Atribución-NonComercial-NoDerivada 4.0 Internacional.