The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, not withstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
|Author:||Wael Alkhatib, Steffen Schnitzer, Wei Ding, Peter Jiang, Yassin Alkhalili, Christoph Rensing|
|Kind:||In proceedings - use for conference & workshop papers|
|Book title:||(Accepted for publication) in the proceeding of the 19th International Conference on Computational Linguistics and Intelligent Text Processing|
|Keywords:||semantics; statistics; feature selection; dimensionality reduction; text classication; typed dependencies.|
|Research Area(s):||Knowledge Media|
|Abstract:||The under-explored research area of multi-label text classification has led to substantial amount of research in adapting feature selection techniques to handle multi-label data directly. A wide range of statistical techniques have been proposed for weighting and selecting features in order to reduce the high dimensionality of feature space. Those echniques suffer from losing semantic regularities of concepts as features and ignoring the dependencies and ordering between adjacent words. In this work, we undertake a comparative study across a set of statistical and semantic-based techniques for feature selection. Moreover, we propose a novel approach incorporating the text semantics in feature selection using typed dependencies. Our intensive experiments, using the EUR-lex dataset, showed that incorporating text semantics in feature selection can signicantly improve the performance of multi-label classifiers. Moreover, it drastically decrease the computation costs by reducing the feature space. The experiments approved that our method applied to a combination of typed dependencies outperformed the state-of-the-art techniques for feature selection in terms of F1-measure.|
If the paper is not available from this page, you might contact the author(s) directly via the "People" section on our KOM Homepage.