Resumen :
Dentro de los campos de la econometría y la ingeniería de producción, un tema de interés es la evaluación de la eficiencia técnica de entidades a partir de la estimación de la frontera de mejores prácticas, la cual delimita el conjunto de posibilidades de producción o tecnología. Por definición, un... Ver más
In econometrics and production engineering, a topic of interest is the evaluation
of technical efficiency of firms from the estimation of the best practice frontier, which
delineates the production possibility set or technology. By definition, a technology
must satisfy a set of microeconomic postulates. Likewise, a valid estimator of a
technology should meet the same set of axioms. Among non-parametric approaches,
Data Envelopment Analysis (DEA) and Free Disposal Hull (FDH) stand out. Both
methodologies are deterministic and fulfill the minimal extrapolation principle. This
implies that they are susceptible to random and systematic measurement errors due
to noise, and to overfitting of the sample data used to generate the estimator, limiting
their ability for inference outside the data sample.
Recent literature has explored the use of machine learning techniques to improve
the estimation of production frontiers. However, the use of boosting techniques, a
machine learning methodology based on the sequential combination of multiple weak
models to improve the final prediction, has not been explored. In this Thesis, a new
methodology based on the Gradient Tree Boosting algorithm for the estimation of
production frontiers is developed. As pointed out in the very beginning, the Thesis
is a compendium of three published articles, gathered in Appendices A, B and C. In
the first of these, the original algorithm is adapted so that the resulting estimator
meets the axioms of monotonicity and free disposability (compulsory for production
frontier estimators), leading to the EATBoosting algorithm. In the second one, it is
shown how to calculate different measures of technical efficiency using the technology
generated by the new estimator as a basis. Nevertheless, from a computational
point of view, the new approach involves thousands of decision variables, making it
difficult to solve. To address this issue, a heuristic approximation to exact efficiency
measures is also proposed. Finally, to facilitate the use of this new methodology
by other researchers and professionals, an R library called BoostingDEA has been
developed, which includes the main functionalities of DEA, FDH, and EATBoosting.
The main advantage of the new approach lies in its ability to tackle the problem of overfitting. Unlike traditional techniques, our methodology does not
systematically underestimate the real inefficiency of the Decision Making Units
(DMUs), functioning more as an inferential tool rather than merely descriptive. This
allows for greater discriminatory power, leading to a more precise identification of
inefficiencies, outperforming FDH in the simulated scenarios in both mean squared
error and bias. Additionally, our approach provides a potential solution to the curse
of dimensionality problem, which occurs when the ratio between the number of
DMUs and the number of variables is low. The application of EATBoosting in these
cases allows for a more robust and precise efficiency analysis.
|