A serious problem facing the community of researchers in the field of sign language is the absence of a large parallel corpus for sign language. The ASLG-PC12 project proposes a rule-based approach for building a big parallel corpus of English written texts and American Sign Language glosses. We present a novel algorithm that transforms an English part-of-speech sentence to an ASL gloss. This project was started at the beginning of 2011 as a part of the project WebSign, and it offers today a corpus containing more than one hundred million pairs of sentences between English and ASL glosses. It is available online for free to promote the development and design of new algorithms and theories for American Sign Language processing, for example, statistical machine translation and related fields. On this page, we present an overview and tasks for generating ASL sentences from the Gutenberg Project corpus that contains only English written texts.
if you’re writing or working on the corpus, please cite this paper:
Achraf Othman and Zouhour Tmar. “English-ASL Gloss Parallel Corpus 2012: ASLG-PC12, The Second Release”. Fourth International Conference On Information and Communication Technology and Accessibility ICTA’13, Hammamet, Tunisia, October 24-26, 2013.
Download Resources (raw format only):
English | ASL | # of Sentences | Download |
---|---|---|---|
corpus_0001.clean.en | corpus_0001.clean.asl | 1060672 | download |
corpus_0002.clean.en | corpus_0002.clean.asl | 730077 | download |
corpus_0003.clean.en | corpus_0003.clean.asl | 763107 | download |
corpus_0004.clean.en | corpus_0004.clean.asl | 1097716 | download |
corpus_0005.clean.en | corpus_0005.clean.asl | 5398085 | download |
corpus_0006.clean.en | corpus_0006.clean.asl | 3591540 | download |
corpus_0007.clean.en | corpus_0007.clean.asl | 980379 | download |
corpus_0008.clean.en | corpus_0008.clean.asl | 793 | download |
corpus_0009.clean.en | corpus_0009.clean.asl | 5222 | download |
corpus_0010.clean.en | corpus_0010.clean.asl | 140995 | download |
corpus_0011.clean.en | corpus_0011.clean.asl | 52227 | download |
corpus_0012.clean.en | corpus_0012.clean.asl | 1215317 | download |
corpus_0013.clean.en | corpus_0013.clean.asl | 3269060 | download |
corpus_0014.clean.en | corpus_0014.clean.asl | 2657130 | download |
corpus_0015.clean.en | corpus_0015.clean.asl | 2017828 | download |
corpus_0016.clean.en | corpus_0016.clean.asl | 1022422 | download |
Σ 24002570 |
English-ASL Gloss Parallel Corpus 2012: ASLG-PC12 by Dr. Achraf Othman is licensed under Attribution-NonCommercial 4.0 International
Related Links: