About APE-QUEST

Objective

While machine translation (MT) quality has dramatically progressed with the shift to deep learning, the data behind the MT system remains a crucial factor. Therefore, the risk of errors (e.g. terminological errors) is especially high in case a generic MT engine is applied to domain-specific text, as there is a disparity between the data on which the system is trained and the data to which it is applied. In addition, sentence translations produced by MT systems tend to have a variable quality, independently from the domain in question. This variability relates to factors such as the length of a sentence, the distance between source and target language, the presence of ambiguous words with multiple possible translations, etc.

In order to cope with the translation of domain-specific texts and with the variability of MT output, the APE-QUEST (Automated Post-editing and Quality Estimation) project. which ran from 2018 to 2020, builds a Quality Gate that combines automated and human translation. Its automated part is based on a generic MT system and domain-specific components trained by the user.

The goal of the Quality Gate is to obtain an acceptable translation quality for domain-specific texts in a shorter amount of time and at a lower cost than in a traditional translation workflow (human translation only). The criterion acceptable means “understandable” in the assimilation use case (allowing a reader to get the gist of a text through its translation or producing translations that are sufficient for in-house communication) and “publishable” in the dissemination use case (producing translations for external use).

This clip explains the Quality Gate in a nutshell.

Architecture

The Quality Gate combines a generic MT system with two domain-specific components and a human post-edition workflow. The first component, automated quality estimation (QE), produces a quality score for the translation of a sentence produced by the MT system. The second one, automatic post-edition (APE), automatically corrects such an MT translation. Both components are based on deep learning from human corrections to MT output of domain-specific sentences.

The Quality Gate performs the following procedure. It submits a sentence to an MT system, submits the MT output to the QE system, and, based on the QE score and QE score thresholds, assigns the MT output to one of the following three tiers:

1) MT output with acceptable quality. The Quality Gate directly provides such output to the end user or to another tool which further processes the MT output.

2) Moderate-quality MT output. The Quality Gate submits such output to the APE component.

3) Low-quality MT output. The Quality Gate transfers such output to the workflow for human post-editing.

Besides the QE and APE components, the Quality Gate also has a third component, the user interface. It allows the user to request the MT output for a sentence and shows the QE score and (if applicable) the APE output.
This user interface is an adaptation by the project consortium of the open-source tool MateCat.

While the QE and APE components need to be initialised by training them on human corrections to MT output for domain-specific text (see section Evaluation), they can also be updated afterwards, as the Quality Gate keeps track of the human corrections to MT output in the third tier. In addition, if the user of the Quality Gate is owner of the MT system (i.e. does not use an external MT system), the latter can also be improved using the corrected translations. This then leads to a gradual domain adaptation of the generic MT system, and, as a result, a reduction of the need for human post-editions.

Evaluation procedure

While the Quality Gate built in APE-QUEST acts as a reference implementation that can be used in various environments, the implementation was evaluated on a specific domain, on specific language pairs, and with a specific MT system during the project.

During the evaluation, the Quality Gate made use of the eTranslation system of the Connecting Europe Facility (CEF), and, as a result, could be considered as an extension of this MT system. The eTranslation system is developed by the Directorate-General for Translation, supports all 24 official EU languages, and is provided by the CEF Automated Translation building block of the Directorate-General for Communications Networks, Content and Technology (DG CNECT) as a service to Digital Service Infrastructures (DSIs) of the EC and to public administrations of Member States.

QE and APE models were built for three language pairs (English into French, Dutch, and Portuguese) on domain-specific post-edited data (texts relating to the legal domain, procurement, and online dispute resolution). These data (approximately 10,000 sentences per language pair) are made publicly available through the ELRC-SHARE repository.

The results of the Quality Gate were assessed by human evaluators.

Potential users

While the Quality Gate was evaluated using eTranslation, three language pairs, and a specific domain, it may make use of another MT system, and apply it for any language pair supported by the system, in any environment with domain-specific texts. The only condition is that human corrections for a substantial amount of MT output are available in this environment, so the QE and APE components can be initialised. This may involve initial investment in human effort.