Automatic Taxonomy Extraction in Different Languages using Wikipedia and minimal language-specific Information
Key: DSRS12-1
Author: Renato Domínguez García, Sebastian Schmidt, Christoph Rensing, Ralf Steinmetz
Date: March 2012
Kind: In proceedings
Publisher: Springer
Book title: Computational Linguistics and Intelligent Text Processing
Keywords: Hyponymy Detection, Multilingual large-scale taxonomies, Wikipedia Mining, NLP
Abstract: Knowledge bases extracted from Wikipedia are particularly useful for various NLP and Semantic Web applications due to their coverage, actuality and multilingualism. This has led to many approaches for automatic knowledge base extraction from Wikipedia. Most of these approaches rely on the English Wikipedia as it is the largest Wikipedia version. However, each Wikipedia version contains socio-cultural knowledge, i.e. knowledge with relevance for a specific culture or language. In this work, we describe a method for extracting a large set of hyponymy relations from the Wikipedia category system that can be used to acquire taxonomies in multiple languages. More specifically, we describe a set of 20 features that can be used for for Hyponymy Detection without using additional language-specific corpora. Finally, we evaluate our approach on Wikipedia in five different languages and compare the results with the WordNet taxonomy and a multilingual approach based on interwiki links of the Wikipedia.
View Full paper (PDF) | Download Full paper (PDF)

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, not withstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.