Improving Persian-English statistical machine translation: experiments in domain adaption

aut.researcherMoir, Tom
dc.contributor.authorMohaghegh, M
dc.contributor.authorSarrafzadeh, A
dc.contributor.authorMoir, T
dc.date.accessioned2011-09-05T21:46:44Z
dc.date.accessioned2012-04-12T06:44:25Z
dc.date.accessioned2012-04-12T06:47:14Z
dc.date.available2011-09-05T21:46:44Z
dc.date.available2012-04-12T06:44:25Z
dc.date.available2012-04-12T06:47:14Z
dc.date.copyright2011-11
dc.date.issued2011-11
dc.description.abstractThis paper documents recent work carried out for PeEn-SMT, our Statistical Machine Translation system for translation between the English-Persian language pair. We give details of our previous SMT system, and present our current development of significantly larger corpora. We explain how recent tests using much larger corpora helped to evaluate problems in parallel corpus alignment, corpus content, and how matching the domains of PeEn-SMT’s components affect translation output. We then focus on combining corpora and approaches to improve test data, showing details of experimental setup, together with a number of experiment results and comparisons between them. We show how one combination of corpora gave us a metric score outperforming Google Translate for the English-to-Persian translation. Finally, we outline areas of our intended future work, and how we plan to improve the performance of our system to achieve higher metric scores, and ultimately to provide accurate, reliable language translation.
dc.identifier.citationThe 5th International Joint Conference on Natural Language Processing, Thailand, 2011-11-08 - 2011-11-13
dc.identifier.urihttps://hdl.handle.net/10292/3771
dc.publisherCiteSeerX
dc.relation.replaceshttp://hdl.handle.net/10292/1983
dc.relation.replaces10292/1983
dc.relation.replaceshttp://hdl.handle.net/10292/3770
dc.relation.replaces10292/3770
dc.relation.urihttp://www.ijcnlp2011.org/proceeding/workshop/WS1_WSSANLP/pdf/WSSANLP02.pdf
dc.rightsCiteSeerx is compliant with the Open Archives Initiative Protocol for Metadata Harvesting, which is a standard proposed by The Open Archive Initiative in order to facilitate content dissemination. For data not mentioned here, please contact us through feedback.
dc.rights.accessrightsOpenAccess
dc.titleImproving Persian-English statistical machine translation: experiments in domain adaption
dc.typeConference Contribution
pubs.organisational-data/AUT
pubs.organisational-data/AUT/Design & Creative Technologies
pubs.organisational-data/AUT/Design & Creative Technologies/School of Engineering
pubs.organisational-data/AUT/PBRF Researchers
pubs.organisational-data/AUT/PBRF Researchers/Design & Creative Technologies PBRF Researchers
pubs.organisational-data/AUT/PBRF Researchers/Design & Creative Technologies PBRF Researchers/DCT Eng Electrical & Electronic
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Moir -Improving Persian-English Statistical Machine - text -WSSANLP02.pdf
Size:
213.16 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
licence.htm
Size:
29.98 KB
Format:
Unknown data format
Description: