|Project Title||Digging into signs: Developing standard annotation practices for cross-linguistic, quantitative analysis of sign language data|
|Start Date||1 June 2014|
|End Date||31 May 2015|
|UK Project Manager||Dr. Kearsy Cormier, University College London, Deafness Cognition and Language Research Centre, 49 Gordon Square, London WC1H 0PD, +44 2076798674, k.cormier@ucl..ac.uk|
|Project Team||Dr. Onno Crasborn, Radboud University Nijmegen, Department of Linguistics, P.O. Box 9103, NL6500 HD, Nijmegen, The Netherlands, +31 24 3611377; firstname.lastname@example.org|
|Lead Institution||University College London|
|Project Partners||Radboud University Nijmegen|
For sign languages used by deaf communities, linguistic corpora have until recently been unavailable, due to the lack of a writing system and a written culture in these communities, and the very recent advent of digital video. Recent improvements in video and computer technology have now made larger sign language datasets possible; however, large sign language datasets that are fully machine-readable are still elusive. This is due to two challenges.
- Inconsistencies that arise when signs are annotated by means of spoken/written language.
- The fact that many parts of signed interaction are not necessarily fully composed of lexical signs (equivalent of words), instead consisting of constructions that are less conventionalised.
As sign language corpus building progresses, the potential for some standards in annotation is beginning to emerge. But there have been no attempts to standardise these practices across corpora, which is required to be able to compare data crosslinguistically.
This project has the following aims:
- To develop annotation standards for glosses
- To test their reliability and validity
- To improve current software tools that facilitate a reliable workflow
Together these aims will not only set a standard for the whole field of sign language studies throughout the world but also making significant advances toward two of the world’s largest machine-readable datasets for sign languages.
Anticipated Outputs and Outcomes
The end product of these protocols – i.e. open access annotated corpora themselves and associated lexicons – will be valuable to a variety of user groups, including linguists, students learning BSL or NGT, and research students as a data source for dissertations. The project also makes possible further work comparing sign languages with other related and unrelated signed and spoken languages elsewhere in the world, and will contribute to our understanding of sign languages and more generally to the characterisation of the human faculty of language, the study of which has been predominantly focused on spoken language.