Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/11000/30550

A novel approach to learning through categorical variables applicable to the classification of solitary pulmonary nodule malignancy

Título :
A novel approach to learning through categorical variables applicable to the classification of solitary pulmonary nodule malignancy
Autor :
Bosch-Romeu, Raquel  
Librero, Julian
Senent Valero, Marina
Sanfeliu-Alonso, Maria Carmen
Salinas-Serrano, Jose Maria
Fores Martos, Jaume  
Suay-Garcia, Beatriz
Climent, Joan  
Falco, Antonio  
Pastor-Valero, Maria  
Departamento:
Departamentos de la UMH::Salud Pública, Historia de la Ciencia y Ginecología
Fecha de publicación:
2023-01
URI :
https://hdl.handle.net/11000/30550
Resumen :
Background: One of the main drawbacks in constructing a classification model is that some or all of the covariates are categorical variables. Classical methods either assign labels to each output of a categorical variable or are summarised measures (frequencies and percentages), which can be interpreted as probabilities. Methods: We adopted a novel mathematical procedure to construct a classification model from categorical variables based on a non-classical probability approach. More specifically, we codified the variables following the categorical data representation from the Discriminant Correspondence Analysis before constructing a non-classical probability matrix system that represents an entangled system of dependent-independent variables. We then developed a disentangled procedure to obtain an empirical density function for each representative class (minimum of two classes). Finally, we constructed our classification model using the density functions. Results: We applied the proposed procedure to build a classification model of the malignancy of Solitary Pulmonary Nodule (SPN) after five years of follow up using routine clinical data. First, with 2/3 (270) of the sample of 404 patients with SPN, we constructed the classification model, and then validated it with the remaining 1/3(134) we validated it. We tested the procedure’s stability by repeating the analysis randomly 1000 times. We obtained a model accuracy of 0.74, an F1 score of 0.58, a Cohen’s Kappa value of 0.41 and a Matthews Correlation Coefficient of 0.45. Finally, the area under the ROC curve was 0.86. Conclusion: The proposed procedure provides a machine learning classification model with an acceptable performance of a classification model of solitary pulmonary nodule malignancy constructed from routine clinical data and mainly composed of categorical variables. It provides an acceptable performance, which could be used by clinicians as a tool to classify SPN malignancy in routine clinical practice.
Palabras clave/Materias:
Classiffication methods
non classical probabilities
solitary pulmonary nodule
Área de conocimiento :
CDU: Ciencias aplicadas: Medicina
Tipo documento :
application/pdf
Derechos de acceso:
info:eu-repo/semantics/openAccess
Attribution-NonCommercial-NoDerivatives 4.0 Internacional
DOI :
https://doi.org/10.21203/rs.3.rs-2502360/v1
Aparece en las colecciones:
Artículos Salud Pública, Historia de la Ciencia y Ginecología



Creative Commons La licencia se describe como: Atribución-NonComercial-NoDerivada 4.0 Internacional.