Automatic Acquisition of Taxonomies in Different Languages from Multiple Wikipedia Versions

Author:Renato Domínguez García, Christoph Rensing, Ralf Steinmetz
Date:September 2011
Kind:In proceedings - use for conference & workshop papers
Publisher:ACM International Conference Proceedings Series ACM Inc.
Organization:ACM International Conference Proceedings Series ACM Inc.
Book title:Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies
Editor:Stefanie Lindstaedt, Michael Granitzer
Abstract:In the last years, the vision of the Semantic Web has led to many approaches that aim to automatically derive knowledge bases from Wikipedia. These approaches rely mostly on the English Wikipedia as it is the largest Wikipedia version and have lead to valuable knowledge bases. However, each Wikipedia version contains socio-cultural knowledge, i.e. knowledge with specifi c relevance for a culture or language. One difficulty of the application of existing approaches to multiple Wikipedia versions is the use of additional corpora. In this paper, we describe the adaptation of existing heuristics that make the extraction of large sets of hyponymy relations from multiple Wikipedia versions with little information about each language possible. Further, we evaluate our approach with Wikipedia versions in four diff erent languages and compare results with GermaNet for German and WordNet for English.
