Statistical Machine Translation for Sign Language, ASL-SMT

ASL-SML deal with machine translation to sign language. It starts with studying existing systems and issues in order to propose a new model for statistical machine translation from written English text to American Sign Language (English/ASL). The study covers specificity of Sign Language from different communities and a scope of existing tools and solutions. According to the state of the art, the aims of this paper is to propose a new approach aiming to build artificial corpus using grammatical dependencies rules due to the lack of resources for Sign Language. The parallel corpus was the input of our machine translation that has been used to create the statistical memory translation based on the IBM alignment algorithms. These algorithms have been improved and optimized by integrating Jaro-Winkler distances. Then, based on the constructed translation memory, we have implemented a decoder to translate an English text to the American Sign Language using a new transcription system based on gloss annotation. Results had been evaluated by the BLEU evaluation metric.

if you are interested to deploy your own machine translation, check my tutorial here or in my youtube channel (TechCarrot).

Link to the live demo

Corpus Sample Data:

American Sign Language Data (right click and click on save as)
English Data (right click and click on save as)

>> Click here to download the full corpus

if you’re writing or working on the corpus, please cite this paper:

Achraf Othman and Zouhour Tmar. “English-ASL Gloss Parallel Corpus 2012: ASLG-PC12, The Second Release”. Fourth International Conference On Information and Communication Technology and Accessibility ICTA’13, Hammamet, Tunisia, October 24-26, 2013.

English-ASL Gloss Parallel Corpus 2012: ASLG-PC12 by Dr. Achraf Othman is licensed under Attribution-NonCommercial 4.0 International

Schema Ressources for Gloss Annotation System (XML-Gloss):

XSD Schema (right click and click on save as)
XSL Schema (right click and click on save as)

if you’re writing or working on the corpus, please cite this paper:

Achraf Othman, Mohamed Jemni, “An XML-Gloss Annotation System for Sign Language Processing“, 6th International Conference On Information and Communication Technology and Accessibility ICTA’17, Muscat, Oman, December 19-21, 2017.

If you’re just writing about this work, please cite this paper as follow:

Achraf Othman, Mohamed Jemni, “Designing High Accuracy Statistical Machine Translation for Sign Language Using Parallel Corpus—Case study English and American Sign Language “, Journal of Information Technology Research, Volume 12, Issue 2, 2019.