Evalutation of Interpretable Machine Learning
Deep neural networks are often considered Black Boxes due to their complex structure and their high number on non-linear computations. Even though they might perform better as humans for a specific task, their lack of transparency causes distrust and prevents them from being used in a much broader field of applications. Interpretability increasing algorithms aim to provide insight into a model and its decision-making. Various methods appeared in recent years, but do they really work or do they misinterpret parameters of the model? The difficulty of evaluating interpretability is the lack of ground truth. This work provides two evaluation settings in which we test different attibution methods. Both define a reasonable ground truth due to the appropriate choice of model and data. First, we propose to evaluate the model on the simplest non-linear problem - XOR. This allows us to understand both model and data, deriving a ground truth based on this knowledge. Additionally, we employ an invertible neural network which allows us to inspect internals more easily. The results confirm a promising way of evaluating attribution methods and establish a reliable evaluation framework based on a derived ground truth.