next up previous
Next: Issues for the Asian Up: A Core Metadata Set Previous: The elements of the


The controlled vocabularies

Controlled vocabularies are enumerations of legal values for the code and refine attributes, and are currently undergoing development. In some cases, more than one value applies and the corresponding element must be repeated, once for each applicable value. In other cases, no value is applicable and the corresponding element is simply omitted. In yet other cases, the controlled vocabulary may fail to provide a suitable item, in which case the most similar vocabulary item can be optionally specified, and a prose comment included in the element content.

OLAC-Language:
A vocabulary for identifying the language(s) that the data is in, or that a piece of linguistic description is about, or that a particular tool can process.

OLAC-Linguistic-Type:
The primary linguistic descriptors for a language resource: transcription, annotation, description and lexicon (with subcodes for each type).

OLAC-CPU:
A vocabulary for identifying the CPU(s) for which the software is available, in the case of binary distributions: x86, mips, alpha, ppc, sparc, 680x0.

OLAC-Encoding:
A vocabulary for identifying the character encoding used by a digital resource, e.g. iso-8859-1, ...

OLAC-Format:
A vocabulary for identifying the manifestation of the resource. The representation is inspired by MIME types, e.g. text/sf for SIL standard format. (Format.markup is used to identify the particular tagset.) It may be necessary to add new types and subtypes to cover non-digital holdings, such as manuscripts, microforms, and so forth and we expect to be able to incorporate an existing vocabulary.

OLAC-Functionality:
A vocabulary for classifying the functionality of software, again using the MIME style of representation, and using the HLT Survey as a source of categories [Cole1997] as advocated by the ACL/DFKI Natural Language Software Registry. For example, written/OCR would cover ``written language input, print or handwriting optical character recognition.''

OLAC-OS:
A vocabulary for identifying the operating system(s) for which the software is available: Unix, MacOS, OS2, MSDOS, MSWindows. Each of these has optional subtypes, e.g. Unix/Linux, MSWindows/winNT.

OLAC-Rights:
A vocabulary for classifying the rights held over a resource, e.g.: open, restricted, ...

OLAC-Role:
A vocabulary for identifying the role of a contributor or creator of the resource, e.g.: author, editor, translator, transcriber, sponsor, ...

OLAC-Software-Rights:
A vocabulary for classifying the rights held over a resource, e.g.: open-source, royalty-free-library, royalty-free-binary, commercial, ...

OLAC-Sourcecode:
A vocabulary for identifying the programming language(s) used by software which is distributed in source form, e.g.: C++, Java, Python, Tcl, VB, ...


next up previous
Next: Issues for the Asian Up: A Core Metadata Set Previous: The elements of the
2001-11-21