Sentence Alignment for English Urdu Language Pair
Syed Abdul Basit Andrabi1, Abdul Wahid2
1Syed Abdul Basit Andrabi, Department of CS & IT, Maulana Azad National Urdu University Hyderabad, India.
2Abdul Wahid, Department of CS & IT, Maulana Azad National Urdu University Hyderabad, India.
Manuscript received on 20 April 2019 | Revised Manuscript received on 27 May 2019 | Manuscript published on 30 May 2019 | PP: 1867-1870 | Volume-8 Issue-1, May 2019 | Retrieval Number: A1228058119/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Sentence aligned parallel text is an important resource in statistical machine translation; therefore, Sentence alignment is a crucial part of machine translation. The sentence alignment task comprises of recognizing the correspondence between words sentences and paragraphs of the source and target languages. Different researchers proposed several sentence alignment algorithms for aligning sentences of the source and target language. In this paper, we have explored sentence alignment algorithms based on character length, word length and lexical matching and carry out performance analysis of Gale and Church Algorithm on English and low resource language Urdu.
Index Terms: Low Resource Languages, Sentence Alignment, Parallel Corpus, Statistical Machine Translation, Parallel Corpus
Scope of the Article: Natural Language Processing