Next: Issues for the Asian
Up: A Core Metadata Set
Previous: The elements of the
The controlled vocabularies
Controlled vocabularies are enumerations of legal values for the
code and refine attributes, and are currently
undergoing development. In some cases, more than one value applies
and the corresponding element must be repeated, once for each
applicable value. In other cases, no value is applicable and
the corresponding element is simply omitted. In yet other cases, the
controlled vocabulary may fail to provide a suitable item, in which case
the most similar vocabulary item can be optionally specified,
and a prose comment included in the element content.
- OLAC-Language:
- A vocabulary for identifying the language(s) that the data is in, or that
a piece of linguistic description is about, or that a particular tool can
process.
- OLAC-Linguistic-Type:
- The primary linguistic descriptors for a language resource:
transcription, annotation, description and
lexicon (with subcodes for each type).
- OLAC-CPU:
- A vocabulary for identifying the CPU(s) for which the software is
available, in the case of binary distributions:
x86, mips, alpha, ppc, sparc, 680x0.
- OLAC-Encoding:
- A vocabulary for identifying the character encoding used by a digital
resource, e.g. iso-8859-1, ...
- OLAC-Format:
- A vocabulary for identifying the manifestation of the resource.
The representation is inspired by MIME types, e.g. text/sf for
SIL standard format. (Format.markup is used to identify the particular
tagset.) It may be necessary to add new types and subtypes to cover
non-digital holdings, such as manuscripts, microforms, and so forth
and we expect to be able to incorporate an existing vocabulary.
- OLAC-Functionality:
- A vocabulary for classifying the functionality of software,
again using the MIME style of representation, and using the
HLT Survey as a source of categories [Cole1997] as advocated
by the ACL/DFKI Natural Language Software Registry. For example,
written/OCR would cover ``written language input, print or
handwriting optical character recognition.''
- OLAC-OS:
- A vocabulary for identifying the operating system(s) for which the software
is available:
Unix, MacOS, OS2, MSDOS, MSWindows.
Each of these has optional subtypes, e.g.
Unix/Linux, MSWindows/winNT.
- OLAC-Rights:
- A vocabulary for classifying the rights held over a resource, e.g.:
open, restricted, ...
- OLAC-Role:
- A vocabulary for identifying the role of a contributor or creator of the
resource, e.g.: author, editor, translator,
transcriber, sponsor, ...
- OLAC-Software-Rights:
- A vocabulary for classifying the rights held over a resource, e.g.:
open-source, royalty-free-library,
royalty-free-binary, commercial, ...
- OLAC-Sourcecode:
- A vocabulary for identifying the programming language(s) used by
software which is distributed in source form, e.g.:
C++, Java, Python, Tcl, VB, ...
Next: Issues for the Asian
Up: A Core Metadata Set
Previous: The elements of the
2001-11-21