{"id":718,"date":"2021-04-15T02:47:09","date_gmt":"2021-04-15T02:47:09","guid":{"rendered":"https:\/\/achrafothman.net\/site\/?page_id=718"},"modified":"2021-04-15T05:40:55","modified_gmt":"2021-04-15T05:40:55","slug":"english-asl-gloss-parallel-corpus-2012-aslg-pc12","status":"publish","type":"page","link":"https:\/\/achrafothman.net\/site\/english-asl-gloss-parallel-corpus-2012-aslg-pc12\/","title":{"rendered":"English-ASL Gloss Parallel Corpus 2012: ASLG-PC12"},"content":{"rendered":"<p>A serious problem facing the community of researchers in the field of sign language is the absence of a large parallel corpus for sign language. The ASLG-PC12 project proposes a rule-based approach for building a big parallel corpus of English written texts and American Sign Language glosses. We present a novel algorithm that transforms an English part-of-speech sentence to an ASL gloss. This project was started at the beginning of 2011 as a part of the project <a href=\"http:\/\/www.latice.rnu.tn\/websign\/\" target=\"_blank\" rel=\"noopener\">WebSign<\/a>, and it offers today a corpus containing more than one hundred million pairs of sentences between English and ASL glosses. It is available online for free to promote the development and design of new algorithms and theories for American Sign Language processing, for example, statistical machine translation and related fields. On this page, we present an overview and tasks for generating ASL sentences from the Gutenberg Project corpus that contains only English written texts.<\/p>\n<p>\nif you&#8217;re writing or working on the corpus, please cite this paper:<\/p>\n<blockquote style=\"font-size: 15px;\"><p>Achraf Othman and Zouhour Tmar. \u201c<a href=\"https:\/\/www.achrafothman.net\/aslsmt\/English-ASL-Gloss-Parallel-Corpus-2012-ASLG-PC12.pdf\" target=\"_blank\" rel=\"noopener\">English-ASL Gloss Parallel Corpus 2012: ASLG-PC12, The Second Release<\/a>\u201d. Fourth International Conference On Information and Communication Technology and Accessibility ICTA\u201913, Hammamet, Tunisia, October 24-26, 2013.<\/p><\/blockquote>\n<p>Download Resources (raw format only):<\/p>\n<table>\n<thead>\n<tr>\n<th>English<\/th>\n<th>ASL<\/th>\n<th># of Sentences<\/th>\n<th>Download<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>corpus_0001.clean.en<\/td>\n<td>corpus_0001.clean.asl<\/td>\n<td>1060672<\/td>\n<td><a href=\"https:\/\/www.dropbox.com\/s\/egdxi7gwhtricir\/corpus_0001.clean.zip?dl=0\" target=\"_blank\" rel=\"noopener\">download<\/a><\/td>\n<\/tr>\n<tr>\n<td>corpus_0002.clean.en<\/td>\n<td>corpus_0002.clean.asl<\/td>\n<td>730077<\/td>\n<td><a href=\"https:\/\/www.dropbox.com\/s\/aa79drd3a3kbok4\/corpus_0002.clean.zip?dl=0\" target=\"_blank\" rel=\"noopener\">download<\/a><\/td>\n<\/tr>\n<tr>\n<td>corpus_0003.clean.en<\/td>\n<td>corpus_0003.clean.asl<\/td>\n<td>763107<\/td>\n<td><a href=\"https:\/\/www.dropbox.com\/s\/8ltqx5sqg00hck9\/corpus_0003.clean.zip?dl=0\" target=\"_blank\" rel=\"noopener\">download<\/a><\/td>\n<\/tr>\n<tr>\n<td>corpus_0004.clean.en<\/td>\n<td>corpus_0004.clean.asl<\/td>\n<td>1097716<\/td>\n<td><a href=\"https:\/\/www.dropbox.com\/s\/tbcj4enno7fxho3\/corpus_0004.clean.zip?dl=0\" target=\"_blank\" rel=\"noopener\">download<\/a><\/td>\n<\/tr>\n<tr>\n<td>corpus_0005.clean.en<\/td>\n<td>corpus_0005.clean.asl<\/td>\n<td>5398085<\/td>\n<td><a href=\"https:\/\/www.dropbox.com\/s\/5bpkkfomzqruavj\/corpus_0005.clean.zip?dl=0\" target=\"_blank\" rel=\"noopener\">download<\/a><\/td>\n<\/tr>\n<tr>\n<td>corpus_0006.clean.en<\/td>\n<td>corpus_0006.clean.asl<\/td>\n<td>3591540<\/td>\n<td><a href=\"https:\/\/www.dropbox.com\/s\/uxg2t27xr4emr7c\/corpus_0006.clean.zip?dl=0\" target=\"_blank\" rel=\"noopener\">download<\/a><\/td>\n<\/tr>\n<tr>\n<td>corpus_0007.clean.en<\/td>\n<td>corpus_0007.clean.asl<\/td>\n<td>980379<\/td>\n<td><a href=\"https:\/\/www.dropbox.com\/s\/g3h0xwu69t7g8co\/corpus_0007.clean.zip?dl=0\" target=\"_blank\" rel=\"noopener\">download<\/a><\/td>\n<\/tr>\n<tr>\n<td>corpus_0008.clean.en<\/td>\n<td>corpus_0008.clean.asl<\/td>\n<td>793<\/td>\n<td><a href=\"https:\/\/www.dropbox.com\/s\/2dgzpvo9o3qm4jj\/corpus_0008.clean.zip?dl=0\" target=\"_blank\" rel=\"noopener\">download<\/a><\/td>\n<\/tr>\n<tr>\n<td>corpus_0009.clean.en<\/td>\n<td>corpus_0009.clean.asl<\/td>\n<td>5222<\/td>\n<td><a href=\"https:\/\/www.dropbox.com\/s\/ez6nghljbgqs8w3\/corpus_0009.clean.zip?dl=0\" target=\"_blank\" rel=\"noopener\">download<\/a><\/td>\n<\/tr>\n<tr>\n<td>corpus_0010.clean.en<\/td>\n<td>corpus_0010.clean.asl<\/td>\n<td>140995<\/td>\n<td><a href=\"https:\/\/www.dropbox.com\/s\/4t8mh5u8gipjw7z\/corpus_0010.clean.zip?dl=0\" target=\"_blank\" rel=\"noopener\">download<\/a><\/td>\n<\/tr>\n<tr>\n<td>corpus_0011.clean.en<\/td>\n<td>corpus_0011.clean.asl<\/td>\n<td>52227<\/td>\n<td><a href=\"https:\/\/www.dropbox.com\/s\/dncvqra6oh9b35t\/corpus_0011.clean.zip?dl=0\" target=\"_blank\" rel=\"noopener\">download<\/a><\/td>\n<\/tr>\n<tr>\n<td>corpus_0012.clean.en<\/td>\n<td>corpus_0012.clean.asl<\/td>\n<td>1215317<\/td>\n<td><a href=\"https:\/\/www.dropbox.com\/s\/38smytqhdhrh8qy\/corpus_0012.clean.zip?dl=0\" target=\"_blank\" rel=\"noopener\">download<\/a><\/td>\n<\/tr>\n<tr>\n<td>corpus_0013.clean.en<\/td>\n<td>corpus_0013.clean.asl<\/td>\n<td>3269060<\/td>\n<td><a href=\"https:\/\/www.dropbox.com\/s\/6brerw643nlncdg\/corpus_0013.clean.zip?dl=0\" target=\"_blank\" rel=\"noopener\">download<\/a><\/td>\n<\/tr>\n<tr>\n<td>corpus_0014.clean.en<\/td>\n<td>corpus_0014.clean.asl<\/td>\n<td>2657130<\/td>\n<td><a href=\"https:\/\/www.dropbox.com\/s\/mpmqr2k8npgv2nd\/corpus_0014.clean.zip?dl=0\" target=\"_blank\" rel=\"noopener\">download<\/a><\/td>\n<\/tr>\n<tr>\n<td>corpus_0015.clean.en<\/td>\n<td>corpus_0015.clean.asl<\/td>\n<td>2017828<\/td>\n<td><a href=\"https:\/\/www.dropbox.com\/s\/mt789jqyup76ujv\/corpus_0015.clean.zip?dl=0\" target=\"_blank\" rel=\"noopener\">download<\/a><\/td>\n<\/tr>\n<tr>\n<td>corpus_0016.clean.en<\/td>\n<td>corpus_0016.clean.asl<\/td>\n<td>1022422<\/td>\n<td><a href=\"https:\/\/www.dropbox.com\/s\/5zy4vokezzto5qx\/corpus_0016.clean.zip?dl=0\" target=\"_blank\" rel=\"noopener\">download<\/a><\/td>\n<\/tr>\n<\/tbody>\n<tfoot>\n<tr>\n<td><\/td>\n<td><\/td>\n<td>\u03a3 24002570<\/td>\n<td><\/td>\n<\/tr>\n<\/tfoot>\n<\/table>\n<p style=\"font-size: 13px;background-color: beige;\" xmlns:cc=\"http:\/\/creativecommons.org\/ns#\" xmlns:dct=\"http:\/\/purl.org\/dc\/terms\/\"><a property=\"dct:title\" rel=\"cc:attributionURL\" href=\"https:\/\/achrafothman.net\/site\/english-asl-gloss-parallel-corpus-2012-aslg-pc12\/\">English-ASL Gloss Parallel Corpus 2012: ASLG-PC12<\/a> by <a rel=\"cc:attributionURL dct:creator\" property=\"cc:attributionName\" href=\"https:\/\/achrafothman.net\/\">Dr. Achraf Othman<\/a> is licensed under <a href=\"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/?ref=chooser-v1\" target=\"_blank\" rel=\"license noopener noreferrer\" style=\"display:inline-block;\">Attribution-NonCommercial 4.0 International<img decoding=\"async\" style=\"height:22px!important;margin-left:3px;vertical-align:text-bottom;\" src=\"https:\/\/mirrors.creativecommons.org\/presskit\/icons\/cc.svg?ref=chooser-v1\"><img decoding=\"async\" style=\"height:22px!important;margin-left:3px;vertical-align:text-bottom;\" src=\"https:\/\/mirrors.creativecommons.org\/presskit\/icons\/by.svg?ref=chooser-v1\"><img decoding=\"async\" style=\"height:22px!important;margin-left:3px;vertical-align:text-bottom;\" src=\"https:\/\/mirrors.creativecommons.org\/presskit\/icons\/nc.svg?ref=chooser-v1\"><\/a><\/p>\n<p>Related Links:<\/p>\n<ul>\n<li><a href=\"https:\/\/paperswithcode.com\/dataset\/aslg-pc12\" target=\"_blank\" rel=\"noopener\">PaperWithCodes: ASLG-PC12 (English-ASL Gloss Parallel Corpus 2012)<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/kayoyin\/transformer-slt\" target=\"_blank\" rel=\"noopener\">Better Sign Language Translation with STMC-Transformer by Yin, Kayo and Read, Jesse (BLEU-4 Score 82.87 using Transformer Ens. Model)<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>A serious problem facing the community of researchers in the field of sign language is the absence of a large parallel corpus for sign language. The ASLG-PC12 project proposes a rule-based approach for building a big parallel corpus of English written texts and American Sign Language glosses. We present a novel algorithm that transforms an<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_uag_custom_page_level_css":""},"aioseo_notices":[],"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false,"post-thumbnail":false,"contentberg-main":false,"contentberg-main-full":false,"contentberg-slider-stylish":false,"contentberg-slider-carousel":false,"contentberg-slider-grid-b":false,"contentberg-slider-grid-b-sm":false,"contentberg-slider-bold-sm":false,"contentberg-grid":false,"contentberg-list":false,"contentberg-list-b":false,"contentberg-thumb":false,"contentberg-thumb-alt":false},"uagb_author_info":{"display_name":"Achraf Othman","author_link":"https:\/\/achrafothman.net\/site\/author\/achraf-othman\/"},"uagb_comment_info":0,"uagb_excerpt":"A serious problem facing the community of researchers in the field of sign language is the absence of a large parallel corpus for sign language. The ASLG-PC12 project proposes a rule-based approach for building a big parallel corpus of English written texts and American Sign Language glosses. We present a novel algorithm that transforms an","jetpack_shortlink":"https:\/\/wp.me\/P8KjJN-bA","jetpack-related-posts":[{"id":34,"url":"https:\/\/achrafothman.net\/site\/asl-smt\/","url_meta":{"origin":718,"position":0},"title":"Statistical Machine Translation for Sign Language (ASL-SMT)","date":"January 19, 2017","format":false,"excerpt":"ASL-SML deal with machine translation to sign language. It\u00a0starts with studying existing systems and issues in order to propose a new model for statistical machine translation from written English text to American Sign Language (English\/ASL). The study covers specificity of Sign Language from different communities and a scope of existing\u2026","rel":"","context":"Similar post","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":40,"url":"https:\/\/achrafothman.net\/site\/list-publications\/","url_meta":{"origin":718,"position":1},"title":"List of Publications","date":"January 19, 2017","format":false,"excerpt":"2021 Achraf Othman, Oussama El Ghoul, \u201cSyntactic and semantic annotation tool for Qatari Sign Language Corpus\u201d, 8th International Conference on Information and Communication Technology and Accessibility ICTA\u201921, December 8-10, 2021 [online]. Mohamed Koutheair Khribi, Achraf Othman, Aljazi Nasser Al Jabor, \u201cFostering ICT accessibility proficiency through Mada ICT-AID Competency Framework\u201d, 8th\u2026","rel":"","context":"Similar post","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":37,"url":"https:\/\/achrafothman.net\/site\/about-me\/","url_meta":{"origin":718,"position":2},"title":"About me","date":"January 19, 2017","format":false,"excerpt":"IEEE Senior Member. Currently, I am the Head of ICT Accessibility Innovation and Research Section at Mada Center, Doha, Qatar. Working on research projects to enable persons with disabilities using innovative technologies and Artificial Intelligence (AI). I have more than five years of technical leadership and people management experience and\u2026","rel":"","context":"Similar post","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/achrafothman.net\/site\/wp-content\/uploads\/cropped-achrafothman-sq-300x300.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"https:\/\/achrafothman.net\/site\/wp-json\/wp\/v2\/pages\/718"}],"collection":[{"href":"https:\/\/achrafothman.net\/site\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/achrafothman.net\/site\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/achrafothman.net\/site\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/achrafothman.net\/site\/wp-json\/wp\/v2\/comments?post=718"}],"version-history":[{"count":22,"href":"https:\/\/achrafothman.net\/site\/wp-json\/wp\/v2\/pages\/718\/revisions"}],"predecessor-version":[{"id":748,"href":"https:\/\/achrafothman.net\/site\/wp-json\/wp\/v2\/pages\/718\/revisions\/748"}],"wp:attachment":[{"href":"https:\/\/achrafothman.net\/site\/wp-json\/wp\/v2\/media?parent=718"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}