Reduced Robust Random Cut Forest for Out-of-Distribution Detection in Machine Learning Models

Home > Archive > 2022 > Volume 12 Number 5 (Sept. 2022) >

IJMLC 2022 Vol.12(5): 221-228 ISSN: 2010-3700
DOI: 10.18178/ijmlc.2022.12.5.1104

Harsh Vardhan and Janos Sztipanovits

Abstract—Most machine learning based regressors extract in-formation from data collected via past observations to make predictions in the future. Consequently, when input to these trained models is data with significantly different statistical properties from data used for training, there is no guarantee of accurate prediction. Consequently, using these models on out of distribution input data may result in a completely different predicted outcome from the desired one, which is not only erroneous but can also be hazardous in some cases. Successful deployment of these machine learning models in any system requires a detection system, which should be able to distinguish between out-of-distribution and in-distribution data (i.e. similar to training data). In this paper, we introduce a novel approach for this detection process using Reduced Robust Random Cut Forest (RRRCF) data-structure, which can be used on both small and large datasets. Similarly, to the Robust Random Cut Forest (RRCF), RRRCF is a structured, but reduced representation of the training data sub-space in form of cut-trees. Empirical results of this method on both low and high dimensional data showed that inference about data being in/out of training distribution can be made efficiently and the model is easy to train with no difficult hyper-parameter tuning. The paper discusses two different use-cases for testing and validating results.

Index Terms—Random Cut Forest, Robust Random Cut Forest, interpretable intelligence, out-of-Distribution detection, machine learning, cyber physical system.

The authors are with the Institute of Software and Integrated System, Vanderbilt University, Nashville, TN, USA (e-mail: harsh.vardhan@vanderbilt.edu, janos.sztipanovits@vanderbilt.edu).

[PDF]

Cite: Harsh Vardhan and Janos Sztipanovits, "Reduced Robust Random Cut Forest for Out-of-Distribution Detection in Machine Learning Models," International Journal of Machine Learning and Computing vol. 12, no. 5, pp. 221-228, 2022.

Copyright @ 2022 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

PREVIOUS PAPER

Prediction of Fused Magnesium Operating Mode Based on ADASYN-XGBoost

NEXT PAPER

A Machine Learning Approach for the Classification of Lower Back Pain in the Human Body

General Information

E-ISSN: 2972-368X
Abbreviated Title: Int. J. Mach. Learn.
Frequency: Quaterly
DOI: 10.18178/IJML
Editor-in-Chief: Dr. Lin Huang
Executive Editor: Ms. Cherry L. Chen
Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals Library, CNKI.
E-mail: ijml@ejournal.net

Home

About IJML

Editorial Board

Author Guideline

Editor Guideline

Reviewer Guideline

Special Issues

Archive

Home > Archive > 2022 > Volume 12 Number 5 (Sept. 2022) >

Reduced Robust Random Cut Forest for Out-of-Distribution Detection in Machine Learning Models

General Information

Article Metrics in Dimensions