Springe direkt zu Inhalt

Masterarbeit: "Evaluating the explanation of black box decision for text classification"


Explainable AI (XAI) is artificial intelligence in which the results of the solution can be understood by humans. It contrasts with the concept of the “black box” in machine learning where even its designers cannot explain why an AI arrived at a specific decision. Recent breakthroughs in machine learning normally come along with more complicated models, which hide their internal logic and inner workings behind the users or even experts and can be treated as black boxes. Aware of the increasing concerns on transparency of machine learning algorithms, many efforts have been made to help humans to perceive how AI models work [1, 2]. However, there is no clear universal consensus on quantitative evaluation for generated explanations. [3] analyzes the usefulness of the generated explanations with human-based evaluations. In addition to the costs it could take, this is questionable since users are assumed to have no knowledge on the decision making process, especially for those explanations of some poorly trained models which could be against human intuitions. An alternative is automatic evaluation, [4] measures the quality of explanations by computing the confidence drop after removing the important features given by the explaining methods. The measurement is easier to use compared to those with humans involved and actually matches the expectation on the so-called important features. The limitation of this method is that it does not take the completeness and compactness [5] into account.

The goal of this master thesis is to propose a model-agnostic evaluation metric for measuring the performance of black box explanations following “the three Cs of interpretability”[6], namely Correctness, Completeness and Compactness. For correctness, the metric should of course verify whether the detected features lead to the final decision; for completeness, the metric should be able to tell the coverage of the explanation on the truly important features; for compactness, the explanation must be succinct otherwise will get penalty from the metric for irrelevant words. In the end, several experiments should be made to compare the evaluation on given explanations by different metrics.

An ideal candidate should be:

  • a self-motivated and independent learner

  • knowledgeable about machine learning (indicated by good grades in related courses)

  • experienced with Python

    The thesis will be co-supervised by Prof. Eirini Ntoutsi (eirini.ntoutsi@fu-berlin.de) and PhD candidate Yi Cai (yi.cai@fu-berlin.de) from the Institute of Computer Science.



    [1] Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "" Why should i trust you?" Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016.

    [2] Lundberg, Scott M., and Su-In Lee. "A Unified Approach to Interpreting Model Predictions." Advances in Neural Information Processing Systems 30 (2017): 4765-4774.

    [3] Nguyen, Dong. "Comparing automatic and human evaluation of local explanations for text classification." Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.

    [4] Samek, Wojciech, et al. "Evaluating the visualization of what a deep neural network has learned." IEEE transactions on neural networks and learning systems 28.11 (2016): 2660-2673.

    [5] Carvalho, Diogo V., Eduardo M. Pereira, and Jaime S. Cardoso. "Machine learning interpretability: A survey on methods and metrics." Electronics 8.8 (2019): 832.

    [6] Silva, Wilson, et al. "Towards complementary explanations using deep neural networks." Understanding and Interpreting Machine Learning in Medical Image Computing Applications. Springer, Cham, 2018. 133-140.