Automatic Text Difficulty Estimation Using Embeddings and Neural Networks
Key: FSR19-1
Author: Anna Filighera, Tim Steuer, Christoph Rensing
Date: September 2019
Kind: In proceedings
Publisher: Springer
Book title: 14th European Conference on Technology Enhanced Learning, EC-TEL 2019
Abstract: Text difficulty, also called reading difficulty, refers to the complexity of texts on a language level. For many educational applications, such as learning resource recommendation systems, the text difficulty of text is highly relevant information. However, manual annotation of text difficulty is very expensive and not feasible for large collections of texts. For this reason, many approaches to automatic text difficulty estimation have been proposed in the past. All text difficulty estimation models published thus far have one thing in common: they rely on manually engineered feature sets. This is problematic as features are tailored to a specific type of text and do not generalize well to other types and languages. To alleviate this problem we propose a novel approach using neural networks and embeddings to the task of text difficulty classification. Our approach distinguishes between 5 reading levels which correspond to non-overlapping age groups ranging from ages 7 to 16. It performs comparably to existing state-of-the-art approaches in terms of accuracy and Pearson correlation coefficient while being easier and cheaper to adapt to new types of text.
View Full paper (PDF) | Download Full paper (PDF)
Official URL

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, not withstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.