The Auslan Corpus annotation files

At present, 357 movies in the Auslan Corpus have annotation files containing annotations at various levels of detail. Annotations are being added to the corpus all the time. The current annotation files have one or more of the following types of annotations:

  • identification and IDglossing of nouns and verbs only
  • sign tokenization and IDglossing for all signs
  • tagging for sign grammatical class ("part of speech")
  • identification of gaze direction during points
  • identification of palm orientation during points
  • identification of clause boundaries
  • identification of verb arguments
  • tagging of verb arguments for macro-roles and semantic roles
  • tagging for the presence or absence of spatial modification
  • the identification of periods constructed action ('role shift')
  • free translation
  • literal translation.

The amount of time required for the annotation of signed language texts is enormous and it is anticipated that it will take many years before the Auslan archive becomes sufficiently richly annotated (and hence machine-readable) and qualifies as a true linguistic corpus.

Value-adding the movies in the archive with annotations is time consuming and expensive. These annotation files are not publicly available but will be made to fellow researchers on requests on a data-sharing and data-enrichment basis (i.e., access to existing annotation files will be granted on condition that enriched annotation files are returned to the corpus). Research collaboration is also encouraged.

Click here for a copy of the guidelines used to create the annotations for the Auslan Corpus as it now exists. (Last updated February 2016.)