December 12, 2018 – ,
Multi-label classification is the task of assigning a set of labels from a fixed vocabulary to a sample of data i.e. image, audio, text, etc. Multi-label text classification has been applied to a multitude of tasks, including document indexing, tag suggestion, and sentiment classification. However, many of the applied methods disregard word order, opting to use bag-of-words models or TFIDF weighting to create document vectors. With the advent of powerful semantic embeddings, such as word2vec and GloVe, we want to investigate how word embeddings and word order can be used to improve multi-label classification. Word embeddings is one of the strongest trends in Natural Language Processing (NLP). It is a technique to learn semantically meaningful representations for words from local co-occurrences in sentences. The relative similarity between two words vector representations as well as words order can capture meaningful syntactic and semantic regularities.
In this work, we aim to develop/adapt machine learning methods (mainly Deep Learning) to improve the multi-label text classification. By considering word order and their vector representation, new features space will be. The task will be to extend the current system for multi-label text classification using Gated recurrent unit (GRU), which is one of the most remarkable deep learning structures for sequential data. This includes:
The written report must contain an introduction to the topic and provide an overview of related work. Furthermore, the designed and implemented methods must be described and discussed.
• Good programming skills in mind. A high level language mainly python
• Helpful: Previous experience in Natural Language Processing and Machine Learning
Beginning and duration
Immediately, duration 3-6 months (depending on the course)
Keywords: NLP, GRU, Machine Learning, Clustering, Deep Learning, Word Embeddings, crosslingual
Research Area(s): Knowledge & Educational Technologies
Student: Luna Alrawas