Text Classification based Filters for a Domain-Specific Search Engine
Key: SSR15-1
Author: Sebastian Schmidt, Steffen Schnitzer, Christoph Rensing
Date: May 2016
Kind: @article
Keywords: domain-specific search engine, classification, machine learning
Abstract: Domain-specific search engines exist in various fields, providing additional value by exploiting knowledge of their respective domains. One common mechanism used are filters which allow narrowing down the search results based on pre-defined filter categories. In this article we exploit the usage of a text classification system for the creation of these filters. The approach is tailored to work in large-scale settings with reduced amounts of manually annotated training data and hence enables a cost-efficient roll-out of new filters. An initial annotation study resulted in a corpus which was used for an off-line evaluation of the approach giving insights into the effect of the system’s parameters. Finally, a large online evaluation was executed together with a provider of a domain-specific search engine. This article presents important aspects that need to be taken into consideration when implementing text classification-based filters in the industrial setting of a domain-specific search engine.
View Full paper (PDF) | Download Full paper (PDF)
Official URL

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, not withstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.