


So far as WER and SER are concerned, NMT has 32.58% and 28% respectively. On the contrary, NMT module has Kappa scores of 0.61 and 1 on comprehensibility and grammaticality respectively. Furthermore, the system has also been quantitatively evaluated on the basis of word error rate (21.11%) and sentence error rate (72.39%). The Kappa scores of PBSMT for comprehensibility and grammaticality are 0.24 and 0.22 respectively which is indicative of the fact that on both counts the scores are not up to the mark. The scores are calculated by the Fleiss' Kappa statistical measure with regard to comprehensibility and grammaticality on the basis of which error analysis and suggestions have been provided for improvement. In order to evaluate the output text in a qualitative manner, the Inter-translator Agreement (IA) of three human translators has been considered with their scores on a five-point scale. In this study, a model corpus set of 100 English sentences has been applied out of 1k cross-domain data considering various types of verbs as input text to evaluate the output of the online systems in Urdu. This system is popularly known as Rosetta, formerly governed by Phrase-based approach and is presently governed by the neural module of source and target languages. Abstract: The paper demonstrates the qualitative evaluation of the English to Urdu Machine Translation Systems, namely PBSMT and NMT hosted on Google's Translate.
