Towards Using Wikipedia as a Substitute Corpus for Topic Detection and Metadata Generation in E-Learning
Key: MRS06-4
Author: Marek Meyer, Christoph Rensing, Ralf Steinmetz
Date: November 2006
Kind: In proceedings
Book title: Proceedings of the 3rd annual e-learning conference on Intelligent Interactive Learning Object Repositories
Abstract: Metadata is crucial for reuse of Learning Resources. Only with good metadata, there is a chance that a Learning Resource can be successfully found in a repository. However, many Learning Resources are still delivered with no or little attached metadata. Automatic metadata generation is used to put things right - either as assistance for the author, or as part of a repository's retrieval functionality. Among the various metadata fields, those that cover the topic of a Learning Resource are the most important ones - especially keywords and categorization information. This paper presents a novel approach for domain-independent classification and keyword extraction by utilizing the immense knowledge that is gathered in the free Wikipedia encyclopedia. Wikipedia is proposed as a substitute corpus for classification methods in E-Learning. To support this proposal, the co-occurrence of matching topics and statistical similarity between Learning Resources and Wikipedia articles is analyzed. An algorithm for keyword generation based on the Wikipedia encyclopedia has been implemented and is described in detail in this paper. First results of the algorithm are presented and discussed.

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, not withstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.