Answer Extraction Methods for Educational Automated Question Generation

October 01, 2020 – /,


The thesis context is educational automatic question generation where our research group tries to automatically pose questions about arbitrary textbooks. Yet, state-of-the-art question generators expect the answer to the question they should generate as part of the input.

For example:

In paragraph: "... The difference is fluid friction, both within the fluid itself and between the fluid and its surroundings, which we call viscosity. ..."

In answer: "viscosity"

Out question: "How do we call the difference in fluid friction?"

The main goal of the thesis is therefore to extract such answer candidates (e.g. "viscosity“) from a given paragraph of text. This task has already been addressed with supervised and unsupervised models on a variety of datasets (e.g. SQuAD) which can be seen as a starting point and as a strong baseline. Finally, the thesis may also investigate how transferable the developed approaches are to other educational datasets.



  1. In-depth review of the related work of answer candidate selection
  2. Iteratively developing a model that extracts answer candidates on SQuAD
  3. Automatically evaluating the model in comparison to state-of-the-art baselines
  4. Automatic or empirical evaluation of your implemented approach on an educational dataset to investigate its generalizability.



  • Interest in Educational Technologies
  • Experience in NLP / Statistical Learning (e.g. Contextualized Word Embeddings, Classifier Evaluation)


Initial Literature

Willis, A., Davis, G., Ruan, S., Manoharan, L., Landay, J., & Brunskill, E. (2019, June). Key Phrase Extraction for Generating Educational Question-Answer Pairs. In Proceedings of the Sixth (2019) ACM Conference on Learning@ Scale (pp. 1-10).

download corresponding tendering

Keywords: NLU, information extraction

Research Area(s):

Tutor: Steuer,

Open Theses