Text Spotting in Video: Recent Progress and Future Trends
Manasa Devi Mortha1, Seetha Maddala2, Viswanadha Raju Somalaraju3

1Manasa Devi Mortha, Department of Computer Science & Engineering, VNR Vignana Jyothi Institute of Engineering & Technology, Vigana Jyothi Nagar, Hyderabad (Telangana), India.
2Dr. Seetha Maddala, Department of Computer Science & Engineering, G. Narayanamma Institute of Technology & Science, Shaikpet, Hyderabad (Telangana), India.
3Dr. Viswanadha Raju Somalaraju, Department of Computer Science & Engineering, Jawaharlal Nehru Technological University, Jagtial, Nachupally, Jagtial (Telangana), India.
Manuscript received on 23 April 2019 | Revised Manuscript received on 05 May 2019 | Manuscript Published on 17 May 2019 | PP: 116-124 | Volume-7 Issue-6S4 April 2019 | Retrieval Number: F10230476S419/2019©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The wid¬¬e popularity of videos, images, documents available in the internet have led to the demand for automatic annotations, indexing, construction of videos and many other applications. In order to implement these demands, text is a major cause of information which requires detection, localization, tracking and recognition process. Nevertheless, text variation owing to font-size, font-style, direction, occlusion, poor resolution makes automatic text extraction more challenging. Thus, video pre-processing plays a vital role before detecting and recognizing the text. This paper emphasizes on survey for different detection and recognition methods, feature descriptors, datasets, and performance and evaluation process from diverse publications. Traditional methods like connected components, region based, texture based, Neural Networks, OCR have been reviewed. Among which Scale Invariant Feature Transform (SIFT), Maximally Stable Extremal Regions (MSER), Convolution Neural Networks (CNN) are effective and efficient feature descriptors in spotting the text. However, this paper shows comparative study of ubiquitous features descriptors along with their dependant parameters which declines the performance of recognizing the video text. Conversely, hybrid methods and CNN techniques have done magnificent work to achieve text spotting in scene images on different datasets like ICDAR, ImageNet, and CIFAR10 etc. However, ICDAR 2013/15 is specially prepared to challenge the detection of text in videos. Finally, related performance metrics and future trends for video text spotting are comprehensively analysed.
Keywords: Recognition, Text Detection, Caption Text, Tracking, Natural Scene Text, Convolution NN, Video Pre-Processing, Feature Descriptors.
Scope of the Article: Text Mining