International Journal of Computer Science & Engineering Technology

ISSN : 2229-3345

Open Access
Open Access

ABSTRACT

Title : A Novel Approach for Web Document Classification
Authors : Rajendra Kumar Roul
Keywords : Classification, Fuzzy Association, Fuzzy Related Term, Gensim, Information Retrieval.
Issue Date : August 2013
Abstract :
The web is a huge repository of information and there is a need for categorizing web documents to facilitate the search and retrieval of documents. Web document classification plays an important role in information organization and retrieval.This paper presents a fuzzy set based approach for automatically classifying web documents into one of the classes represented by a set of training documents belonging to a number of classes. Using same word to represent more than one meaning and many words representing one meaning lead to ambiguity especially in web environment where numbers of users are very large. This problem is tackled using fuzzy association wherein each pair of words has a value associated with it. This helps in distinguishing it with other such pairs of words and thus helps in tackling ambiguities. The approach present in this paper does not require any parameter to be given by the user and hence is independent of any bias that may occur due to user input. It requires a training set on which the model is trained and then test set is given as input to be classified. We used Gensim package to implement the approach because of its simplicity and robust nature. The experimental results show that our approach efficiently classifies the web documents by tackling ambiguities among the words.
Page(s) : 1118-1125
ISSN : 2229-3345
Source : Vol. 4, Issue.8

Copyright © 2010-2024 IJCSET KEJA Publications