Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/11000/28956

Efficiency Analysis Trees


Vista previa

Ver/Abrir:
 Esteve Campello, Miriam.pdf

2,21 MB
Adobe PDF
Compartir:
Título :
Efficiency Analysis Trees
Autor :
Esteve, Miriam  
Tutor:
Aparicio, Juan  
Rabasa, Alejandro  
Editor :
Universidad Miguel Hernández de Elche
Departamento:
Departamentos de la UMH::Estadística, Matemáticas e Informática
Fecha de publicación:
2022-09-30
URI :
https://hdl.handle.net/11000/28956
Resumen :
The definition of technical efficiency through the prior estimation of a production frontier has been a relevant topic in the literature related to production theory and engineering. In the last forty years, many parametric and non-parametric approaches have been introduced to estimate production frontiers for a given set of data. However, few of these methodologies are based on machine learning techniques, despite being a growing field of research. In this thesis, a new methodology based on regression trees is introduced to estimate the production frontiers satisfying the fundamental postulates of microeconomics, such as the property of free disposal. This new approach, known as Efficiency Analysis Trees (EAT), shares some similarities with the Free Disposal Hull (FDH) technique. However, unlike FDH, EAT overcomes the overfitting problem by using cross-validation to prune the deep tree obtained in a first stage. Through Monte Carlo simulations, the performance of EAT is measured, showing that the new approach reduces the mean square error associated with the estimation of the real frontier between 13% and 70% compared to standard FDH. However, these individual decision trees have some drawbacks: (1) Individual trees do not usually have a high level of prediction accuracy, and (2) trees can be very poorly robust, that is, a small change in the data can cause a big change in the final structure of the fitted tree. That is why an aggregation learning method that works by building a multitude of decision trees at the time of training and aggregating the information from the individual trees into a final prediction value, a technique known as Random Forest, shows that it is capable of overcoming these limitations (James et al., 2013). In this sense, in this thesis, the Random Forest technique is adapted (Breiman, 2001) (RF+EAT) to estimate production frontiers and technical efficiency. To do this, decision tree models are applied to estimate non-overfitted production possibility sets that satisfy the property of free disposability in the context of FDH. There are three main implications of the development of the new approach in this thesis. First, the estimates derived from technical efficiency are robust to resampling of the data and input variables. Secondly, a method is suggested to determine the importance of the input variables in the model, which allows a classification of the inputs to be established. Third, if the relationship between the sample size and the number of variables (inputs and outputs) is low or moderately low, the standard efficiency models in the literature may result in a considerable number of units being evaluated as technically efficient; especially in the case of FDH. This lack of discrimination is often referred to in the literature as the "curse of dimensionality." In this thesis, it is shown that the Random Forest technique can also be considered a remedy for this type of problem. In another sense, from the computational point of view, the algorithm used by EAT is based on a heuristic technique to select the next node to be divided during the growth process of the corresponding decision tree. However, as shown in this thesis, this heuristic does not always produce the minimum mean square error among all the possible trees that could be developed. Therefore, one of the main objectives is to improve the accuracy of the production function estimator generated from EAT by resorting to backtracking techniques (Baase, 2009 and Horowitz and Sahni, 1978). In particular, we combine the idea behind the heuristic approach with the potentiality of backtracking ((Pearl, 1984 and Tarjan, 1972) to improve the quality of the EAT-based production function estimator. In addition, through this new approach, it is possible to reduce the computational load of the standard backtracking techniques applied to the EAT methodology, as shown in the simulated experiences carried out. On the other hand, also from a computational approach, this thesis develops a new package in R, named eat, which includes the functions to estimate the production frontiers and the technical efficiency measures of EAT and RF+EAT. The package includes the functions to estimate the input and output oriented radial measures, the input and output oriented Russell measures, the directional distance function and the weighted additive model. Furthermore, from the perspective of visualizing the models, the package includes graphical representations of the production frontier through tree structures and obtaining rankings of input variable importance in the analysis. In this thesis, the operation of the package is described through the use of a real database.
Palabras clave/Materias:
Informática
Computadores
Estadística
Área de conocimiento :
CDU: Ciencias puras y naturales: Matemáticas
CDU: Generalidades.: Ciencia y tecnología de los ordenadores. Informática.
Tipo documento :
application/pdf
Derechos de acceso:
info:eu-repo/semantics/openAccess
Aparece en las colecciones:
Tesis doctorales - Ciencias e Ingenierías



Creative Commons La licencia se describe como: Atribución-NonComercial-NoDerivada 4.0 Internacional.