Automated Kannada Text Summarization using Sentence Features
Arpitha Swamy1, Srinath S2
1Arpitha Swamy, Department of Computer Science and Engineering, Sri Jayachamarajendra College of Engineering, JSS Science & Technology University, Mysuru, India.
2Srinath S, Department of Computer Science and Engineering, Sri Jayachamarajendra College of Engineering, JSS Science & Technology University, Mysuru, India.
Manuscript received on 05 March 2019 | Revised Manuscript received on 13 March 2019 | Manuscript published on 30 July 2019 | PP: 470-474 | Volume-8 Issue-2, July 2019 | Retrieval Number: B1531078219/19©BEIESP | DOI: 10.35940/ijrte.B1531.078219
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: There is a growing requirement for the text summarization due to the difficulty of managing exponential increase of information accessible on the World Wide Web. Text summarization is a process to extract the contents in the original text to the shorter form which provides important information to the user. The summarizer presented in this paper produces the extractive summaries of Kannada text documents. The proposed summarizer system considers five features to determine the important sentences in the document. The features used are Term Frequency, Term Frequency-Inverse Sentence Frequency, Keywords feature, Sentence length and Sentence position. The value of each feature is computed and score for each sentence in the document is the average of all the feature score values. The sentences with the top scores are selected to be included in the extractive summary. The results of the proposed model are evaluated using ROUGE toolkit to measure the performance based on F-score of generated summary. Experimental studies on custom-built dataset with 50 Kannada text documents shows significantly better performance in producing extractive summaries as compared to human summaries.
Index Terms: Inverse Sentence Frequency, Natural Language Processing, ROUGE, Term Frequency, Text Summarization.
Scope of the Article: Natural Language Processing