Abstract |
: |
Preprocessing is an important task and critical step in Text mining, Natural Language Processing (NLP) and information retrieval (IR). In the area of Text Mining data Preprocessing used for extracting interesting and non-trivial and knowledge from unstructured text data. Information Retrieval (IR) is essentially a matter of deciding which documents in a collection should be retrieved to satisfy a user's need for information. The user's need for information is represented by a query or profile, and contains one or more search terms, plus some additional information such as weight of the words. Hence, the retrieval decision is made by comparing the terms of the query with the index terms (important words or phrases) appearing in the document itself. The decision may be binary (retrieve/reject), or it may involve estimating the degree of relevance that the document has to query. Unfortunately, the words that appear in documents and in queries often have many morphological variants. So before the information retrieval from the documents the data preprocessing techniques are applied on the target data set to reduce the size of the data set which will increase the effectiveness of IR System. Stemming is one of the most important preprocessing technique which reduces all the words into their root word by stripping both prefixes and suffixes. In this paper, we discuss the various Tamil Stemming algorithms and the issues about the each algorithm |