Abstract:
Artificial intelligence (AI) has recently intensified in the global economy due to the great
competence that it has demonstrated for analysis and modeling in many disciplines. This situation
is accelerating the shift towards a more automated society, where these new techniques can be
consolidated as a valid tool to face the difficult challenge of credit fraud detection (CFD). However,
tight regulations do not make it easy for financial entities to comply with them while using modern
techniques. From a methodological perspective, autoencoders have demonstrated their effectiveness
in discovering nonlinear features across several problem domains. However, autoencoders are opaque
and often seen as black boxes. In this work, we propose an interpretable and agnostic methodology
for CFD. This type of approach allows a double advantage: on the one hand, it can be applied
together with any machine learning (ML) technique, and on the other hand, it offers the necessary
traceability between inputs and outputs, hence escaping from the black-box model. We first applied
the state-of-the-art feature selection technique defined in the companion paper. Second, we proposed
a novel technique, based on autoencoders, capable of evaluating the relationship among input and
output of a sophisticated ML model for each and every one of the samples that are submitted to
the analysis, through a single transaction-level explanation (STE) approach. This technique allows
each instance to be analyzed individually by applying small fluctuations of the input space and
evaluating how it is triggered in the output, thereby shedding light on the underlying dynamics of
the model. Based on this, an individualized transaction ranking (ITR) can be formulated, leveraging
on the contributions of each feature through STE. These rankings represent a close estimate of the
most important features playing a role in the decision process. The results obtained in this work were
consistent with previous published papers, and showed that certain features, such as living beyond
means, lack or absence of transaction trail, and car loans, have strong influence on the model outcome.
Additionally, this proposal using the latent space outperformed, in terms of accuracy, our previous
results, which already improved prior published papers, by 5.5% and 1.5% for the datasets under
study, from a baseline of 76% and 93%. The contribution of this paper is twofold, as far as a new
outperforming CFD classification model is presented, and at the same time, we developed a novel
methodology, applicable across classification techniques, that allows to breach black-box models,
erasingthe dependencies and, eventually, undesirable biases. We conclude that it is possible to
develop an effective, individualized, unbiased, and traceable ML technique, not only to comply with
regulations, but also to be able to cope with transaction-level inquiries from clients and authorities
|