Flow-based counterfactuals for interpretable graph node classification
As more deep learning models are deployed for high-stakes use cases, explaining the predictions of a model is becoming more important. One class of methods for explainability are counterfactual examples. A counterfactual modifies a model input in such a way that the model output, for example a classification, changes in a target direction. In this work, we apply an efficient method for generating such counterfactuals (ECINN) to graph node classification. We introduce a synthetic graph dataset with ground-truth explanation labels. Using this dataset, we quantitatively compare the model-specific ECINN method against a model-agnostic counterfactual generation method by Wachter et al. on explanation size and correctness. We find that ECINN produces higher-quality counterfactuals and discuss the trade-offs between it and model-agnostic methods.