Translation risk prediction - eliminate bad quality segments from your TMs (ModelFront)


  • Administrator
  • Hero Member
  • *****
    • Posts: 814059
    • Gender:Male
  • point d’amour
Translation risk prediction - eliminate bad quality segments from your TMs (ModelFront)

ModelFront has developed a tool that uses deep learning or artificial intelligence (and is polite enough not to brag about it as many similar organizations do) to give risk prediction scores for individual translated segments. It bases its scoring on the large corpora it has pre-assembled for many languages (approximately corresponding to the 100+ languages that Google Translate supports) and, maybe more importantly, on the basis of specific translation memories that you can upload and use as a way to evaluate the likelihood of a machine translation being correct or acceptable. The uploaded TMs (that are used only for the purposes of that one client) should have at least 10,000 translation units, preferably of good quality. (This brings to mind the current lawsuit that several language service providers have against the Canadian Translation Bureau about its low-quality translation memories and associated discounts for matches.)

Some practical usages (as I said, everyone is open to add to these) include:

— Eliminating poor data in corpora or translation memories (think of mis-aligned data, corrupted data, or simply incorrectly translated data).
— Evaluating various machine translation engines against each other. These can include customized engines or public generic engines with APIs (application programming interfaces), including Google Translate, Microsoft Translator, Yandex Translate, Baidu Translate, IBM's Watson Language Translator, ModernMT, and DeepL.
— Automating the decision process after machine translation determining which segments need to be completely retranslated and which only post-edited and at what level and which might not even have to be looked at again.

Some errors the tool finds include:

— Erroneous translations of non-translatables (proper nouns, etc.)
— Messed-up negations (a common error in MT'ed segments)
— Offensive or ambiguous translations
— Untranslated words or phrases
— Corrupted translations or translations into the wrong languages

Some actual examples that the tool catches include things like translating (the website-identifying) "cookies" as a baked good item; geographical terms like "Turkey" as an animal; or "Iran" or "Togo" as contractions for "I ran" and "To go." Of course, the tool works for far more than just short and somewhat humorous items like that. ModelFront was used to filter the TAUS Corona Crisis Corpus that I mentioned in the last newsletter. I reviewed the English> German corpus, from which everything with a predicted risk above 5% was dropped, resulting in approximately 11,000 segments removed from a total of 624,000. After a manual review, I would say that in a felt 90% of all cases the automated deletions were justified.

The cost presently is $200 per 1 million characters (=$0.0002 per character or about $0.001 per word, assuming an average of 5 characters per word). By comparison, Google Translate charges $20 per 1 million characters (if you use the API). There are also free trials available.

It's a tool that could clearly could be useful to LSPs and translation buyers but is it also a tool that could be useful for the freelance translator? When it comes to cleaning up TMs or making decisions about which machine translation engine to use (if that's relevant), it might be helpful. And Adam and his team are certainly open to other suggestions as well. (Chances are you can even address those requests and ideas in your preferred language!)

Oh, and the heading ("We love the chaos")? That's what it says on ModelFront's website. Can't wait to tell my wife that some smart people are in complete agreement with me.

— Jost Zetzsche, The 312th Tool Box Journal


Search Tools