An Efficient Romanization of Gurmukhi Punjabi Proper Nouns for Pattern Matching
Harjit Singh1, Ashish Oberoi2
1Harjit Singh, APS Neighbourhood Campus, Punjabi University, Patiala, India. (Research Scholar at RIMT University)
2Ashish Oberoi, School of Engineering, RIMT University, Mandi-Gobindgarh, India.
Manuscript received on 1 August 2019. | Revised Manuscript received on 8 August 2019. | Manuscript published on 30 September 2019. | PP: 634-640 | Volume-8 Issue-3 September 2019 | Retrieval Number: B2467078219/19©BEIESP | DOI: 10.35940/ijrte.B2467.098319
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: A Romanization system is used to convert some text of a source script to the Roman script through word by word mapping. The phonological characteristics of the source word are not lost. Only writing script is changed, without any changes in the spoken language. This paper presents a rule based approach for Romanization of Gurmukhi script proper nouns. The aim is to develop a lightweight Romanization system, which may produce multiple possible results for the same input word. The algorithm uses a list of Gurmukhi script characters along with their equivalent character combinations in Roman script. Direct mapping of Gurmukhi script characters to their equivalent Roman script character combinations does not produce efficient results, so some rules are applied to get the correct mappings. The rules are basically to place or remove the letter ‘a’ in between the mapped consonants. Three different sets of rules are applied to get three different Romanized outputs. All these outputs are acceptable for information extraction using pattern matching. In Gurmukhi, some words are written differently than these are pronounced. To handle such words, these words or part of these words are stored in a database table. Along with these words their Romanized form is also stored in second column. The table is used to directly pick the Romanization from the table and use it for Romanization of these words. The result of this Romanization system is a set of possible words that can be generated from the source script word. It enables an application to pattern match those output words with some text or database to get the required information.
Index Terms: Gurmukhi Punjabi, Natural Language Processing, Rule Based, Romanization.
Scope of the Article: Natural Language Processing