Extraction of Address Data from Unstructured Text using Free Knowledge Resources

Extraction of Address Data from Unstructured Text using Free Knowledge Resources
Key:	SMRS13-1
Author:	Sebastian Schmidt, Simon Manschitz, Christoph Rensing, Ralf Steinmetz
Date:	September 2013
Kind:	In proceedings
Publisher:	ACM
Book title:	Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies, 2013
Keywords:	Information extraction, knowledge discovery, address extraction
Abstract:	The Web is populated with many Web sites containing unstructured textual information. These Web sites are a source of knowledge for various interests. As semantic annotations are only rarely used on Web sites, an automated harvesting of the knowledge without additional effort is not possible. Thus, elaborated approaches for information extraction are required. In our work we face the challenge of identifying business address data on Web sites since we see the need for this data in various applications. In order to accomplish our aim, we have developed a hybrid approach combining patterns and gazetteers obtained from freely available knowledge resources such as OpenStreetMap. Experimental evaluation on a corpus of heterogeneous Web sites shows a high recall and precision. The approach can be adapted for identification of addresses considering the different formats in various countries.
View Full paper (PDF) \| Download Full paper (PDF)
Official URL

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, not withstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.