Real-Time Detection of Single-Text Character Using the Integration of Extremal Region Filtering and Connected Component Filtering
Karen Niña P. Escosura1, Ma. Margarita V. Ortega2, Marlen Joyce H. Tuazon3, Rovilyn L. Carta4, Roselito E. Tolentino5
1Ma. Margarita V. Ortega is currently studying at Polytechnic University of the Philippines Sta. Rosa Campus.
2Karen Niña P. Escosura is currently studying at Polytechnic University of the Philippines Sta. Rosa Campus.
3Marlen Joyce H. Tuazon is a graduate of Polytechnic University of the Philippines Sta. Rosa Campus.
4Rovilyn L. Carta is currently studying at Polytechnic University of the Philippines Sta. Rosa Campus.
5Roselito E. Tolentino, Polytechnic University of the Philippines – Santa Rosa Campus and De La Salle University – Dasmarinas as part time Instructor
Manuscript received on 20 March 2016 | Revised Manuscript received on 30 March 2016 | Manuscript published on 30 March 2016 | PP: 12-16 | Volume-5 Issue-1, March 2016 | Retrieval Number: A1539035116©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Nowadays, among all the contents in images, text information is the most significant value, may it be in a documented text or in a real-world scene. Text detection and recognition in an image is an important task in image analysis. Due to different properties, text detection is a challenging part in an image where textual content are very important. The previous study by Neumann and Matas limits the text detection and recognition in at least three characters. To extend the detection and recognition, especially of single text characters, the proposed solution is to integrate the two methods, Extremal Region Filtering and Connected Component Filtering with Tesseract OCR Engine. Single–text character candidates are filtered by the Connected Component Filtering, wherein the regions extracted and region features from the two stages of Extremal Region Filtering are considered. With structural analysis, connected components with equal value and similar stroke width and stroke orientation are considered single–text character candidates. Character candidates are recognized by the Tesseract OCR engine
Keyword: Connected Component, Extremal region, Singletext, Tesseract
Scope of the Article: Component Based Software Engineering