Abstract—Most machine learning based regressors extract
in-formation from data collected via past observations to make
predictions in the future. Consequently, when input to these
trained models is data with significantly different statistical
properties from data used for training, there is no guarantee of
accurate prediction. Consequently, using these models on out of
distribution input data may result in a completely different
predicted outcome from the desired one, which is not only
erroneous but can also be hazardous in some cases. Successful
deployment of these machine learning models in any system
requires a detection system, which should be able to distinguish
between out-of-distribution and in-distribution data (i.e. similar
to training data). In this paper, we introduce a novel approach
for this detection process using Reduced Robust Random Cut
Forest (RRRCF) data-structure, which can be used on both
small and large datasets. Similarly, to the Robust Random Cut
Forest (RRCF), RRRCF is a structured, but reduced
representation of the training data sub-space in form of
cut-trees. Empirical results of this method on both low and high
dimensional data showed that inference about data being in/out
of training distribution can be made efficiently and the model is
easy to train with no difficult hyper-parameter tuning. The
paper discusses two different use-cases for testing and
validating results.
Index Terms—Random Cut Forest, Robust Random Cut
Forest, interpretable intelligence, out-of-Distribution detection,
machine learning, cyber physical system.
The authors are with the Institute of Software and Integrated System,
Vanderbilt University, Nashville, TN, USA (e-mail:
harsh.vardhan@vanderbilt.edu, janos.sztipanovits@vanderbilt.edu).
Cite: Harsh Vardhan and Janos Sztipanovits, "Reduced Robust Random Cut Forest for Out-of-Distribution Detection in Machine Learning Models," International Journal of Machine Learning and Computing vol. 12, no. 5, pp. 221-228, 2022.
Copyright @ 2022 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).