An Effective Preprocessing Algorithm for Information Retrieval System
Sunita1, Vijay Rana2

1Sunita, Dept of Computer Science Arni University, India Sant Baba Bhag Singh University, India.
2Vijay Rana, Dept of Computer Science, Arni University, India Sant Baba Bhag Singh University, India.

Manuscript received on 19 August 2019. | Revised Manuscript received on 24 August 2019. | Manuscript published on 30 September 2019. | PP: 6371-6375 | Volume-8 Issue-3 September 2019 | Retrieval Number: C5033098319/2019©BEIESP | DOI: 10.35940/ijrte.C5033.098319
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (

Abstract: The innovation of web produced a huge of information, evaluates by empowering Internet users to post their assessments, remarks, and audits on the web. Preprocessing helps to understand a user query in the Information Retrieval (IR) system. IR acts as the container to representation, seeking and access information that relates to a user search string. The information is present in natural language by using some words; it’s not structured format, and sometimes that word often ambiguous. One of the major challenges determines in current web search vocabulary mismatch problem during the preprocessing. In an IR system determine a drawback in web search; the search query string is that the relationships between the query expressions and the expanded terms are limited. The query expressions relate to search term fetching information from the IR. The expanded terms by adding those terms that is most similar to the words of the search string. In this manuscript, we mainly focus on behind user’s search string on the web. We identify the best features within this context for term selection in supervised learning based model. In this proposed system the main focus of preprocessing techniques like Tokenization, Stemming, spell check, find dissimilar words and discover the keywords from the user query because provide better results for the user.
Keywords: Stop-Words, Tokenization, Stemming, Spell Check, Dissimilar, IR.

Scope of the Article:
Web Algorithms