The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, not withstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Comparison of Feature Selection Techniques for Multi-label Text Classi fication against a New Semantic-based Method

Author:Wael Alkhatib, Steffen Schnitzer, Wei Ding, Peter Jiang, Yassin Alkhalili, Christoph Rensing
Date:April 2018
Kind:In proceedings - use for conference & workshop papers
Book title:(Accepted for publication) in the proceeding of the 19th International Conference on Computational Linguistics and Intelligent Text Processing
Keywords:semantics; statistics; feature selection; dimensionality reduction; text classi cation; typed dependencies.
Research Area(s):Knowledge Media
Abstract:The under-explored research area of multi-label text classifi cation has led to substantial amount of research in adapting feature selection techniques to handle multi-label data directly. A wide range of statistical techniques have been proposed for weighting and selecting features in order to reduce the high dimensionality of feature space. Those echniques suffer from losing semantic regularities of concepts as features and ignoring the dependencies and ordering between adjacent words. In this work, we undertake a comparative study across a set of statistical and semantic-based techniques for feature selection. Moreover, we propose a novel approach incorporating the text semantics in feature selection using typed dependencies. Our intensive experiments, using the EUR-lex dataset, showed that incorporating text semantics in feature selection can signi cantly improve the performance of multi-label classifi ers. Moreover, it drastically decrease the computation costs by reducing the feature space. The experiments approved that our method applied to a combination of typed dependencies outperformed the state-of-the-art techniques for feature selection in terms of F1-measure.

If the paper is not available from this page, you might contact the author(s) directly via the "People" section on our KOM Homepage.

[Export this entry to BibTeX]