Improved language modeling for English-Persian statistical machine translation

Moir, TJ; Mohaghegh, M; Sarrafzadeh, A

Improved language modeling for English-Persian statistical machine translation

aut.researcher	Moir, Tom
dc.contributor.author	Moir, TJ
dc.contributor.author	Mohaghegh, M
dc.contributor.author	Sarrafzadeh, A
dc.date.accessioned	2011-06-09T02:46:47Z
dc.date.available	2011-06-09T02:46:47Z
dc.date.copyright	2010-08-28
dc.date.issued	2010-08-28
dc.description.abstract	As interaction between speakers of different languages continues to increase, the everpresent problem of language barriers must be overcome. For the same reason, automatic language translation (Machine Translation) has become an attractive area of research and development. Statistical Machine Translation (SMT) has been used for translation between many language pairs, the results of which have shown considerable success. The focus of this research is on the English/Persian language pair. This paper investigates the development and evaluation of the performance of a statistical machine translation system by building a baseline system using subtitles from Persian films. We present an overview of previous related work in English/Persian machine translation, and examine the available corpora for this language pair. We finally show the results of the experiments of our system using an in-house corpus and compare the results we obtained when building a language model with different sized monolingual corpora. Different automatic evaluation metrics like BLEU, NIST and IBM-BLEU were used to evaluate the performance of the system on half of the corpus built. Finally, we look at future work by outlining ways of getting highly accurate translations as fast as possible.
dc.identifier.citation	Proceedings of SSST-4, Fourth Workshop on Syntax and Structure in Statistical Translation, Dekai Wu (ed.), COLING 2010/SIGMT Workshop, 23rd International Conference on Computational Linguistics, Beijing, China, pp.75-82
dc.identifier.isbn	9.78E+12
dc.identifier.uri	https://hdl.handle.net/10292/1270
dc.publisher	Chinese Information Processing Society of China
dc.relation.uri	http://www.mt-archive.info/SSST-2010-Mohaghegh.pdf
dc.rights	NOTICE: this is the author’s version of a work that was accepted for publication. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in (see Citation). The original publication is available at (see Publisher's Version)
dc.rights.accessrights	OpenAccess
dc.title	Improved language modeling for English-Persian statistical machine translation
dc.type	Conference Contribution
pubs.organisational-data	/AUT
pubs.organisational-data	/AUT/Design & Creative Technologies

Files

Original bundle

Now showing 1 - 1 of 1

Name:: persian.pdf
Size:: 166.15 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: licence.htm
Size:: 29.98 KB
Format:: Unknown data format
Description:

Download

Collections

School of Engineering, Computer and Mathematical Sciences - Te Kura Mātai Pūhanga, Rorohiko, Pāngarau