Evaluation results

We created a set with post-editions  for three language pairs (English into French, English into Dutch, and English into Portuguese), in order to train quality estimation (QE) and automatic post-edition models. Using the resulting models, we conducted tests with human evaluators, in cooperation with an organisation specialised in online dispute resolution in order to find out the relation between translation quality, time, and cost, given various QE score thresholds that determine to which tier a machine translation (MT) output is assigned.

The tests show that translation quality gains compared to a fully automated workflow (MT only) are much higher for the assimilation use case (consumer complaints) than for the dissemination use case (privacy legislation) because in the former case there is a stronger domain mismatch with the MT system: the style of complaints is rather informal, while the MT system is trained on more formal content. The tests also show that the Quality Gate can result in important cost and time savings without strongly compromising the quality of the translation, for both use cases.  

More details on the tests will be reported in a paper to appear in the first half of 2021.
The evaluation performed during the project can serve as a guideline when applying the Quality Gate to a new environment.


