Multi-label Text Classification Using Semantic Features and Dimensionality Reduction with Autoencoders
Key: ARJ17-1
Author: Wael Alkhatib, Christoph Rensing, Johannes Silberbauer
Date: June 2017
Kind: In proceedings
Publisher: Springer, Cham
Book title: International Conference on Language, Data and Knowledge
Keywords: semantics; feature selection; dimensionality reduction; text classi cation; semantic relations; autoencoders.
Abstract: Feature selection is of vital concern in text classi cation to reduce the high dimensionality of feature space. The wide range of statistical techniques which have been proposed for weighting and selecting features su er from loss of semantic relationship among concepts and ignoring of dependencies and ordering between adjacent words. In this work we propose two techniques for incorporating semantics in feature selection. Furthermore, we use autoencoders to transform the features into a reduced feature space in order to analyse the performance penalty of feature extraction. Our intensive experiments, using the EURlex dataset, showed that semantic-based feature selection techniques signifi cantly outperform the Bag-of-Word (BOW) frequency based feature selection method with term frequency/inverse document frequency (TFIDF) for features weighting. In addition, after an aggressive dimensionality reduction of original features with a factor of 10, the autoencoders are still capable of producing better features compared to BOW with TF-IDF.

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, not withstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.