A Study on Dimensionality Reduction Methods for Finding Similarity in Indian English Authors Poetry
K. Praveen Kumar1, T. Maruthi Padmaja2

1K. Praveen Kumar, Department of IT, VFSTR Deemed Tobe University, Guntur, (A.P.), India.
2T. Maruthi Padmaja, Department of CSE, VFSTR Deemed Tobe University, Guntur, (A.P.), India.

Manuscript received on 23 March 2019 | Revised Manuscript received on 30 March 2019 | Manuscript published on 30 March 2019 | PP: 1114-1118 | Volume-7 Issue-6, March 2019 | Retrieval Number: F2527037619/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Due to application ranging from literature to product development companies, identifying a document similarity is one of the pivotal tasks in information retrieval systems. So far, most of the research in this area focused on identifying similarity across the normal documents of prose form. But a poem is different from a general prose text, as it consists stylistic (orthographic, phonetic and syntactic) features, further the data is also a high dimensional distinctiveness. This paper analyzed stylistic features of Indian English authors; using linear, nonlinear semantic and stylistic text semantic analysis methods. The computational methods used for semantic analysis are LSA, MDS, and ISOMAP. The similarity in structures across the poems are identified with Partitioning Around Medoid (PAM) algorithm. From the visualization of the results, it is observed that the poems feature space is linear and there is similarity structure. It was found that using stylistic features is better than the linear and nonlinear semantic methods.
Keywords: Latent Semantic Indexing, TF, IDF, TF-IDF, Similarity, SVD, stylistic features, ISOMAP, MDS.
Scope of the Article: Probabilistic Models and Methods