The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, not withstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Investigating Educational and Noneducational Answer Selection for Educational Question Generation

Author:Tim Steuer, Anna Filighera, Thomas Tregel
Date:June 2022
Kind:Article - use for journal articles only
Journal:IEEE Access
Keywords:Automatic Question Generation, Education, Content Selection, Natural Language Processing
Abstract:Educational automatic question generation (AQG) is often unable to realize its full potential in educational applications due to insufficient training data. For this reason, current research relies on noneducational question answering datasets for system training and evaluation. However, noneducational training data may comprise different language patterns than educational data. Consequently, the research question of whether models trained on noneducational datasets transfer well to the educational AQG task arises. In this work, we investigate the AQG subtask of answer selection, which aims to extract meaningful answers for the questions to be generated. We train and evaluate six modern and well-established BERT-based machine learning model architectures on two widely used noneducational datasets. Furthermore, we introduce a novel, midsized educational dataset for answer selection called TQA-A, which is used to investigate the transfer capabilities of the noneducational models to the educational domain. In terms of phrase-level evaluation metrics, noneducational models perform similar to models trained directly on the novel educational TQA-A dataset, although trained with considerably more training data. Moreover, models trained directly on TQA-A select fewer named entity-based and more verb-based answers than noneducational models. This provides evidence for differences in noneducational and educational answer selection tasks.

If the paper is not available from this page, you might contact the author(s) directly via the "People" section on our KOM Homepage.

[Export this entry to BibTeX]