International Journal of Computer Science & Engineering Technology

ISSN : 2229-3345

Open Access
Open Access

ABSTRACT

Title : AN EFFICIENT APPROACH FOR TEMPLATE EXTRACTION
Authors : Pravallika.CH, Swapna Goud.N, Vishnu Murthy.G
Keywords : Template extraction, Clustering web pages, MDL principle.
Issue Date : August 2012
Abstract :
The World Wide Web is a vast and rapidly growing source of useful information which is used to publish and access the information on the Internet. It uses different templates with contents for providing easy access for readers. But, for search engine detecting the template and displaying the content to the users is a major task in retrieval of web pages from the web. The templates are considered harmful because they compromise the performance of clustering and classification of the web pages. In this paper, we present novel algorithm for extracting templates from web documents which are generated from heterogeneous template structures. In the proposed, we are clustering the web documents based on the similarity in the template structure so that the template for each cluster is extracted simultaneously. The resultant clusters will be given as input to the Roadrunner system, which is used to extract information from template web pages.
Page(s) : 348-352
ISSN : 2229-3345
Source : Vol. 3, Issue.08

Copyright © 2010-2024 IJCSET KEJA Publications