next up previous
Next: Multilingual Lexicons Up: Issues with multilingual resources Previous: Issues with multilingual resources

Machine Translation Systems

The simplest type of MT system is a unidirectional system which translates language $S$ to language $T$. Here, the intended audience of such a system is assumed to be the speakers of language $T$ who use the system to access documents in another language $S$. The OLAC solution would be to designate $S$ as the Subject.language and $T$ as the Language.

Note that ``audience'' is slightly problematic. Such an MT system may be intended for an audience of $S$ speakers who wish to translate their documents into language $T$. The problem here is not with directionality but with the notion of ``audience'' in the OLAC definition of Language. The definition could be adjusted to remove this problem.

Next in order of complexity is the bidirectional case, where a system translates in both directions between languages $X$ and $Y$. Extending the previous solution, we would designate both $X$ and $Y$ as Language and Subject.language. Ideally, we would use order or structure to group the languages appropriately:


<pair><Subject.language code= X/>
      <Language code= Y/></pair>
<pair><Subject.language code= Y/>
      <Language code= X/></pair>

However, OLAC metadata is flat and unordered. The only available options are permutations of the following, in which we can make no contrastive use of order.


<Language code= X/>
<Language code= Y/>
<Subject.language code= X/>
<Subject.language code= Y/>

Although this loses information, we do not believe it presents a problem for typical kinds of retrieval. Queries for an MT system (i) from $X$; (ii) from $Y$; (iii) to $X$; (iv) to $Y$; (v) from $X$ to $Y$; or (vi) from $Y$ to $X$, will discover the system described above.

Next are MT systems which translate from one language into many, or from many languages into one (star configurations). Here the obvious approach is adequate:


One-to-many:
<Subject.language code= S/>
<Language code= T1/>
<Language code= T2/>
<Language code= T3/>

Many-to-one:
<Subject.language code= S1/>
<Subject.language code= S2/>
<Subject.language code= S3/>
<Language code= T/>

Finally, there are MT systems which translate from and to all languages in a set of $n$ languages. Here again the obvious approach is adequate, and is clearly superior to a solution where all $n(n-1)$ ordered pairs are enumerated.


<Subject.language code= X/>
<Subject.language code= Y/>
<Subject.language code= Z/>
<Language code= X/>
<Language code= Y/>
<Language code= Z/>


next up previous
Next: Multilingual Lexicons Up: Issues with multilingual resources Previous: Issues with multilingual resources
2001-11-21