International Journal of Computer Science & Engineering Technology

ISSN : 2229-3345

Open Access
Open Access

ABSTRACT

Title : Concept-Based Document Similarity Based on Suffix Tree Document
Authors : P.Perumal, R. Nedunchezhian, M. Indra Priya
Keywords : Concept-based model, Suffix tree document Model, Suffix tree, Similarity measure, Document clustering.
Issue Date : October 2012
Abstract :
Document clustering has been studied as a post retrieval document visualization technique to provide an intuitive navigation and browsing mechanism by organizing documents into groups and each group represents a different topic. The clustering techniques are based on four concepts: Data representation model, Similarity measure, Clustering model, and Clustering algorithm. In the previous work, phrase has been considered as an informative feature term for improving the effectiveness of document clustering. In this paper, we propose a Concept-based document similarity to compute the similarities of documents based on the Suffix Tree Document (STD) model. By mapping each node in the suffix tree of STD model into a unique feature term in the Vector Space Document (VSD) model, the concept-based document similarity inherits the term ctf (conceptual term frequency), tf (term frequency), df (document frequency) weighting scheme in computing the document similarity with concept. In this paper the concept-based document similarity is applied to the Hierarchical Agglomerative Clustering (HAC) algorithm to develop a new document clustering approach. The new concept-based model analyzes the terms on the sentence, document, and in corpus levels. The similarity between documents is calculated based on a new concept-based similarity measure (Euclidean distance Measure.). The proposed similarity measure takes full advantage of using the concept analysis measures.
Page(s) : 470-475
ISSN : 2229-3345
Source : Vol. 3, Issue.10

Copyright © 2010-2024 IJCSET KEJA Publications