The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, not withstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Towards Generating Counterfactual Examples as Automatic Short Answer Feedback

Author:Anna Filighera, Joel Tschesche, Tim Steuer, Thomas Tregel, Lisa Wernet
Date:August 2022
Kind:In proceedings - use for conference & workshop papers
Publisher:Springer International Publishing
Book title:Artificial Intelligence in Education
Editor:Maria Mercedes Rodrigo, Noburu Matsuda, Alexandra I. Cristea, Vania Dimitrova
Keywords:Explainable AI, Short Answer Grading, Feedback.
Research Area(s):Knowledge Media
Abstract:Receiving response-specific, individual improvement suggestions is one of the most helpful forms of feedback for students, especially for short answer questions. However, it is also expensive to construct manually. For this reason, we investigate to which extent counterfactual explanation methods can be used to generate feedback from short answer grading models automatically. Given an incorrect student response, counterfactual models suggest small modifications that would have led the response to being graded as correct. Successful modifications can then be displayed to the learner as improvement suggestions formulated in their own words. As not every response can be corrected with only minor modifications, we investigate the percentage of correctable answers in the automatic short answer grading datasets SciEntsBank, Beetle and SAF. In total, we compare three counterfactual explanation models and a paraphrasing approach. On all datasets, roughly a quarter of incorrect responses can be modified to be classified as correct by an automatic grading model without straying too far from the initial response. However, an expert reevaluation of the modified responses shows that nearly all of them remain incorrect, only fooling the grading model into thinking them correct. While one of the counterfactual generation approaches improved student responses at least partially, the results highlight the general weakness of neural networks to adversarial examples. Thus, we recommend further research with more reliable grading models, for example, by including external knowledge sources or training adversarially.
Full paper (pdf)

[Export this entry to BibTeX]