DLT: THE FACTS

by Klaus Schubert

Throughout the Esperanto world there have been reports, conjectures and rumours circulating about the DLT project. For definite information, here is a summary history by Klaus Schubert, one of the workers on the project, as posted in sce.

The Distributed Language Translation (DLT) project was a research and development program undertaken by the program maker BSO/Buro voor Systeemontwikkeling in Utrecht, Holland.The goal was to develop a prototype of a system of computerized translation which would be: multilingual, decentralized on the computer network, and usable by an ordinary person knowing only one language.

From these premises there followed a series of consequences which determined the architecture of the system:

1. Multilingual: In a computerized system of translation, based on a direct link between the original language of a text and the language into which it is to be translated, for n number of languages n(n-1) systems of translation are required: i.e. separate systems linking each of the languages to and from each of the other languages.

For example, for 10 languages there would be 90 translation systems, and to add an 11th language 20 more translation systems would have to be built.

To avoid this, an intermediate language is introduced, so that each translation from any original language goesinto the intermediate language and from there into the other languages as desired. A system using an intermediate language requires 20 systems for ten languages, and the addition of an additional language always requires only two further systems.

2. Decentralized: The decentralized nature of the Internet means that the text must first be translated on the originating computer and then translated a second time by each of the receiving computers into the desired language. Thus it can happen that parallel translations take place in various languages; those translations must not need to take place all at the same time, and there must be no need to know at the start in what languages translations will be required.

These requirements, together with considerations of economy and administration, make it necessary for the interlanguage to be used at the stage of sending the messages. Thus each message goes out in only one form.

Conclusions (1) and (2) are only possible if the interlanguage is autonomous, that is, if the forms of expression, the grammatical character of the text, and the choice of words are completely separate from the question of which language the original was in or into what languages it is to be translated.

3. Usability: The requirement that the program be used by an ordinary person who does not know the language into which a text is to be translated has the following consequences:

Since the 1950's it has been generally acknowledged and it is still not disputed that wholly automatic, high quality translation is impossible. (The only possibilities are a completely automatic but very crude translation or a high quality translation that can be prepared only by combining work of a human being with that of the computer. In this case, the human contribution can take place before, during or after the computer does its work. If it is to be done before, then it is a matter of writing in or adapting the text to a specially simplified language, a topic often discussed in the literature.)

For DLT the consequence is that the user must help out in the translating process. Since the user knows only the original language, his help must take place via an interactive dialogue in that original language. Since the system must function in a decentralized fashion, it is in practice impossible for the user to takepart in that phase of the work of translation that involves going from the interlanguage to the final language, because that part takes place at various places in the world and at unforeseeable times, hours, days or years after the original distribution of the message.

From all of this it follows that the user's contribution must take place during the first part of the translation process, thus during the translation from the original language into the interlanguage.

The translation that later takes place from the interlanguage into the final language must be done entirely automatically and with high quality. Since, however, completely automatic high quality translation is not possible, it is necessary for the interlanguageto be decisively more suitable for translation than an ordinary language.

The role of Esperanto: This is the role of Esperanto. For DLT Esperanto was chosen as an interlanguage.Esperanto combines two qualities in itself: it is clearer in syntax and word construction than the ethnic languages, and it is at the same time independent.

-- It is necessary to strengthen the clarity of syntax and word construction in comparison with that of normal human Esperanto.

Esperanto did not possess autonomy right from the start in 1887. It unconsciously achieved its autonomy through many decades of use in the (bilingual) language community. Because of this Esperanto has a decided advantage over possible artificial structures constructed specifically for this application.

Modified Esperanto: Some words on the modifications made in Esperanto for DLT. This topic has occasioned discussion from time to time among people who wish to attack the DLT projector who seek to find support for their own projects at reforming Esperanto.
During the preparatory studies for DLT, a number of modifications of Esperanto were proposed to make it more suitable for its function in DLT. During the prototype project (see below), there were experiments in parsing, syntactical arrangement and word construction structures of Esperanto and with the phase of structural transition which is the central function in translation.There were also experiments and tests of substituting one language's vocabulary for another, which is the principal semantic obstacle that makes completely automatic high quality translation impossible.

After these experiments the DLT team concluded that most of the modifications of Esperanto originally proposed were not necessary for the DLT interlanguage. Since it was later necessary for semantic treatment to be based on a knowledge of a very large series of texts which exist in Esperanto but not in any modified form of Esperanto, it was all the more desirable to go back as closely as possible to normal Esperanto. Only two essential modifications remained:

-- The DLT interlanguage used symbols to designate the boundaries of word elements, so that words could not be misconstrued as with the famous examples kol/eg/o-koleg/o, sen/dat/a-send/at/a, etc. (Remember that such use of word element boundary markers was introduced by Zamenhof in his First Book of Esperanto, in 1887.)

This rule was accompanied by some fine tuning of the Esperanto morpheme system, for example, it was necessary to define the "correlatives" kiu, tiu, etc. as single morpheme words, since their -u ending is not the same as that of the imperative -u. A similar principle can also be used to repair some irregularities in word formation, for example the compound term terpomo must be defined as based on a single morpheme, since obviously it is not an "earthtype of apple" but a metaphorical word construction borrowed from another language.

-- The DLT interlanguage was syntactically unambiguous. This made it possible to avoid a vast number of mistaken analyses of sentence constructions which a human being usually hardly notices but which are a major obstacle in automatic analysis of sentences.

Right in the preparatory stages of DLT it became apparent that a single indicator will serve for removing ambiguity in syntax.

Example: "Undo the screw in the cover that is held by a separate plastic ring" ("malfiksu la þraýbon de la kovrilo kiu estas fiksita per aparta plasta ringo").

Possible analyses:

-- A screw that is held

-- A cover that is held

Removing the ambiguity through DLT: If "that" (kiu) refers to "cover", then "the cover that is held by a separate plastic ring" is a single group of words linked to the preposition de. If "that" (kiu) refers to "screw", then only "the cover" (la kovrilo) is linked to de. A phrase grouping symbol is inserted to tell the analyzing functions that kiu does not link with "cover" but with something else, in this instance "screw".

The DLT time table:

1979. Toon Witkam, who worked in BSO, had the original idea of networking computerized translation using Esperanto as an interlanguage.

1982-1983: Preparatory phrase. Preparatory study carried out by Toon Witkam with one assistant, through a subsidy from the European Community.

1984-1990: Prototype phase. BSO receives a 50% subsidy from the Dutch government for a six year project to develop a prototype. The company establishes its BSO/Research division and hires a large team (up to 20 persons) and develops a series of prototypes for translations between English and French using Esperanto.

1987: Presentation of the first prototype to the specialist press in Utrecht.

1988: Presentation of the secondprototype to the specialist public in the Coling Conference in Budapest.

1988-1990: Further development of new techniques on the basis of tests of the first prototypes, applicationsfor (and receipt of some) patents.

1990: Completion of the project, as planned.

(1990-1992): The company retained the team and established a new division, BSO/Language Technology, in Baarn (Netherlands). This division sold programming services in the area of language technology, on the basis of DLT expertise.

Basic characteristics of DLT (in addition to the above mentioned requirements of being multilingual, decentralized, and fully usable by nonspecialists):

Double translation using Esperanto as an interlanguage.

Dependent syntax.

Implicit solution of the semantic and information based problems inherent in moving between vocabulary sets.

The first solution undertook the use of a knowledge bank of word relations in the interlanguage. The later solution used language knowledge banks from parallel texts in the interlanguage and each of the original and final languages (the so-called Bilingual Knowledge Banks). This solution is still state of the art in comparison with the systems of translation available on the market.

The project involved about fifty years of human labour, in total.

I, the author of this summary, worked in BSO from 1985-1992, and in the final months was project director of DLT.

klaus.schubert@flensburg.netsurf.de

schubert@fhf-tue.com

[Translated from Eventoj 1-2/Dec.,1997]