Evaluation results

We created a set with post-editions  for three language pairs (English into French, English into Dutch, and English into Portuguese), in order to train quality estimation (QE) and automatic post-edition models. Using the resulting models, we conducted tests with human evaluators, in cooperation with an organisation specialised in online dispute resolution in order to find out the relation between translation quality, time, and cost, given various QE score thresholds that determine to which tier a machine translation (MT) output is assigned.

The tests show that translation quality gains compared to a fully automated workflow (MT only) are much higher for the assimilation use case (consumer complaints) than for the dissemination use case (privacy legislation) because in the former case there is a stronger domain mismatch with the MT system: the style of complaints is rather informal, while the MT system is trained on more formal content. The tests also show that the Quality Gate can result in important cost and time savings without strongly compromising the quality of the translation, for both use cases.  

More details on the tests will be reported in a paper to appear in the first half of 2021.
The evaluation performed during the project can serve as a guideline when applying the Quality Gate to a new environment.


Depraetere, H., Van den Bogaert, J., Szoc, S., & Vanallemeersch T. (2020). “APE-QUEST: a Quality Gate for Routing MT”. In: Proceedings of EAMT 2020, pp. 473–474.

Ive, J., Scarton, C., Blain, F., & Specia, L. (2018). “Sheffield Submissions for the WMT18 Quality Estimation Shared Task”. In: Proceedings of the Third Conference on Machine Translation (WMT), Volume 2: Shared Task Papers, pp. 807–813.

Ive, J., Specia, L., Szoc, S., Vanallemeersch, T., Van den Bogaert, J., Farah, E., Maroti, C.,  Ventura, A., & Khalilov, M. (2020). “A Post-Editing Dataset in the Legal Domain: Do we Underestimate Neural Machine Translation Quality?”. In: Proceedings of LREC 2020, pp. 3692–3697.

Specia, L., Blain, F., Logacheva, V., Astudillo, R.F., & Martins, A. (2018). “Findings of the WMT 2018 Shared Task on Quality Estimation”. In: Proceedings of the Third Conference on Machine Translation (WMT), Volume 2: Shared Task Papers, pp. 702–722.

Szoc, S., & Depraetere, H. (2020). “Quality Estimation”. In: Jörg Porsiel (Hg., 2020): Maschinelle Übersetzung für Übersetzungsprofis. Berlin: BDÜ Fachverlag. Sammelband mit Beiträgen in deutscher und englischer Sprache. 384 pages, ISBN: 9783946702092, pp. 198–208.

Vanallemeersch, T., Szoc, S., & Depraetere, H. (to appear). “APE-QUEST, or How to be Picky about Machine Translation”. In: Proceedings of Translating and the Computer 42.

Van den Bogaert, J., Depraetere, H., Szoc, S., Vanallemeersch, T., Van Winckel, K., Everaert, F., Specia, L., Ive, J., Khalilov, M., Maroti, C., Farah, E., & Ventura A. (2019). “APE-QUEST: an MT Quality Gate”. In: Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks, pp. 110–111.