Conference on Advances in Data Science: Talk Titles & Abstracts for Invited Speakers

Genevera Allen, Rice University

Talk Title: Fast, Model-Agnostic Confidence Intervals for Feature Importance via Minipatch Ensembles

Abstract: To promote trustworthy machine learning models for high-stakes problems, models must be both reliable and interpretable. While there is a growing body of work in each of these areas, there has been limited consideration of the reliability and uncertainty quantification for machine learning interpretations. In this paper, our goal is to develop model-agnostic, distribution-free, and assumption-light confidence intervals for a popular type of interpretation: feature importance; these intervals will be valid for any machine learning model and for any regression or classification task on tabular data. We do so by leveraging a form of random observation and feature double subsampling called minipatch ensembles and show that our approach provides assumption-light asymptotic coverage for the feature occlusion importance score of any model. Further, our approach is fast as computations needed for inference come nearly for free as part of the ensemble learning process. Finally, we also show that our same procedure can be leveraged to provide valid confidence intervals for predictions, hence providing fast, simultaneous quantification of the uncertainty of both model predictions and interpretations. Joint work with Luqin Gan and Lili Zheng.