Using Community-Generated Contents as a Substitute Corpus for Metadata Generation
Key: MRS08-1
Author: Marek Meyer, Christoph Rensing, Ralf Steinmetz
Date: January 2008
Kind: @article
Keywords: e-learning, classification, metadata generation, Wikipedia, substitute corpus, online learning, learning resourses, reuse
Abstract: Metadata is crucial for reuse of Learning Resources. Avail- ability of good metadata signi¯cantly increases the chance that a Learn- ing Resource can be successfully found in a repository. However, many Learning Resources are still delivered with no or little attached meta- data. Automatic metadata generation is used to put things right - either as assistance for the author, or as part of a repository's retrieval func- tionality. Among the various metadata ¯elds, those that cover the topic of a Learning Resource are the most important ones - especially keywords and categorization information. This article proposes the use of community generated substitute cor- pora for classi¯cation systems. As an example for such a substitute cor- pus the free online encyclopedia Wikipedia is used as a training corpus for domain-independent classi¯cation and keyword extraction of Learn- ing Resources. An algorithm for keyword generation based on the Wikipedia en- cyclopedia has been implemented. Some results of the algorithm are presented and discussed.
Official URL

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, not withstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.