English-ASL Gloss Parallel Corpus 2012: ASLG-PC12

A serious problem facing the community of researchers in the field of sign language is the absence of a large parallel corpus for sign language. The ASLG-PC12 project proposes a rule-based approach for building a big parallel corpus of English written texts and American Sign Language glosses. We present a novel algorithm that transforms an English part-of-speech sentence to an ASL gloss. This project was started at the beginning of 2011 as a part of the project WebSign, and it offers today a corpus containing more than one hundred million pairs of sentences between English and ASL glosses. It is available online for free to promote the development and design of new algorithms and theories for American Sign Language processing, for example, statistical machine translation and related fields. On this page, we present an overview and tasks for generating ASL sentences from the Gutenberg Project corpus that contains only English written texts.

if you’re writing or working on the corpus, please cite this paper:

Achraf Othman and Zouhour Tmar. “English-ASL Gloss Parallel Corpus 2012: ASLG-PC12, The Second Release”. Fourth International Conference On Information and Communication Technology and Accessibility ICTA’13, Hammamet, Tunisia, October 24-26, 2013.

Download Resources (raw format only):

English ASL # of Sentences Download
corpus_0001.clean.en corpus_0001.clean.asl 1060672 download
corpus_0002.clean.en corpus_0002.clean.asl 730077 download
corpus_0003.clean.en corpus_0003.clean.asl 763107 download
corpus_0004.clean.en corpus_0004.clean.asl 1097716 download
corpus_0005.clean.en corpus_0005.clean.asl 5398085 download
corpus_0006.clean.en corpus_0006.clean.asl 3591540 download
corpus_0007.clean.en corpus_0007.clean.asl 980379 download
corpus_0008.clean.en corpus_0008.clean.asl 793 download
corpus_0009.clean.en corpus_0009.clean.asl 5222 download
corpus_0010.clean.en corpus_0010.clean.asl 140995 download
corpus_0011.clean.en corpus_0011.clean.asl 52227 download
corpus_0012.clean.en corpus_0012.clean.asl 1215317 download
corpus_0013.clean.en corpus_0013.clean.asl 3269060 download
corpus_0014.clean.en corpus_0014.clean.asl 2657130 download
corpus_0015.clean.en corpus_0015.clean.asl 2017828 download
corpus_0016.clean.en corpus_0016.clean.asl 1022422 download
Σ 24002570

English-ASL Gloss Parallel Corpus 2012: ASLG-PC12 by Dr. Achraf Othman is licensed under Attribution-NonCommercial 4.0 International

Related Links: