On the Black-Box Challenge for Fraud Detection Using Machine
Learning (I): Linear Models and Informative Feature Selection

Chaquet Ulldemolins, Jacobo; Gimeno Blanes, Francisco Javier; Moral-Rubio, Santiago; Muñoz-Romero, Sergio; Rojo-Álvarez, José Luis

Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/11000/30611

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Chaquet Ulldemolins, Jacobo	-
dc.contributor.author	Gimeno Blanes, Francisco Javier	-
dc.contributor.author	Moral-Rubio, Santiago	-
dc.contributor.author	Muñoz-Romero, Sergio	-
dc.contributor.author	Rojo-Álvarez, José Luis	-
dc.contributor.other	Departamentos de la UMH::Ingeniería de Comunicaciones	es_ES
dc.date.accessioned	2024-01-24T11:30:11Z	-
dc.date.available	2024-01-24T11:30:11Z	-
dc.date.created	2022-03	-
dc.identifier.citation	Applied Sciences Volume 12 Issue 7 (2022)	es_ES
dc.identifier.issn	2076-3417	-
dc.identifier.uri	https://hdl.handle.net/11000/30611	-
dc.description.abstract	Artificial intelligence (AI) is rapidly shaping the global financial market and its services due to the great competence that it has shown for analysis and modeling in many disciplines. What is especially remarkable is the potential that these techniques could offer to the challenging reality of credit fraud detection (CFD); but it is not easy, even for financial institutions, to keep in strict compliance with non-discriminatory and data protection regulations while extracting all the potential that these powerful new tools can provide to them. This reality effectively restricts nearly all possible AI applications to simple and easy to trace neural networks, preventing more advanced and modern techniques from being applied. The aim of this work was to create a reliable, unbiased, and interpretable methodology to automatically evaluate CFD risk. Therefore, we propose a novel methodology to address the mentioned complexity when applying machine learning (ML) to the CFD problem that uses state-of-the-art algorithms capable of quantifying the information of the variables and their relationships. This approach offers a new form of interpretability to cope with this multifaceted situation. Applied first is a recent published feature selection technique, the informative variable identifier (IVI), which is capable of distinguishing among informative, redundant, and noisy variables. Second, a set of innovative recurrent filters defined in this work are applied, which aim to minimize the training-data bias, namely, the recurrent feature filter (RFF) and the maximally-informative feature filter (MIFF). Finally, the output is classified by using compelling ML techniques, such as gradient boosting, support vector machine, linear discriminant analysis, and linear regression. These defined models were applied both to a synthetic database, for better descriptive modeling and fine tuning, and then to a real database. Our results confirm that our proposal yields valuable interpretability by identifying the informative features’ weights that link original variables with final objectives. Informative features were living beyond one’s means, lack or absence of a transaction trail, and unexpected overdrafts, which are consistent with other published works. Furthermore, we obtained 76% accuracy in CFD, which represents an improvement of more than 4% in the real databases compared to other published works. We conclude that with the use of the presented methodology, we do not only reduce dimensionality, but also improve the accuracy, and trace relationships among input and output features, bringing transparency to the ML reasoning process. The results obtained here were used as a starting point for the companion paper which reports on our extending the interpretability to nonlinear ML architectures.	es_ES
dc.format	application/pdf	es_ES
dc.format.extent	26	es_ES
dc.language.iso	eng	es_ES
dc.publisher	MDPI	es_ES
dc.rights	info:eu-repo/semantics/openAccess	es_ES
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	credit fraud detection	es_ES
dc.subject	explainable machine learning	es_ES
dc.subject	interpretability	es_ES
dc.subject	feature selection	es_ES
dc.subject.other	CDU::6 - Ciencias aplicadas::62 - Ingeniería. Tecnología	es_ES
dc.title	On the Black-Box Challenge for Fraud Detection Using Machine Learning (I): Linear Models and Informative Feature Selection	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.relation.publisherversion	https://doi.org/10.3390/app12073328	es_ES
Aparece en las colecciones: Artículos Ingeniería Comunicaciones

Ver/Abrir:
220325 On the BlackBox Challenge for Fraud Detection Using Machine Learning (I). - Published - applsci-12-03328.pdf

1,3 MB
Adobe PDF
Compartir:

Mostrar el registro sencillo del ítem Ver estadísticas

La licencia se describe como: Atribución-NonComercial-NoDerivada 4.0 Internacional.