Translation - Μετάφραση

Resources, Technical Assistance and Technology News => CAT Tools Tips and Assistance => Translator resources => SDL-Trados => Topic started by: spiros on 03 Dec, 2010, 22:01:41

Title: Term does not appear as a suggestion when the term is lowercase in database and capital in translation unit (SDL-Trados Studio)
Post by: spiros on 03 Dec, 2010, 22:01:41
Term does not appear as a suggestion when the term is lowercase in database and capital in translation unit (SDL-Trados Studio)

For example you have the term "enter=εισαγάγετε" in your MultiTerm database. Your translation unit goes something like: "Enter the text". If you type "Ε" the term will not appear as a suggestion. If you type "ε" it will appear, but then you will have to go back and fix the capitalization.

It seems this issue is still unresolved since SP1.

See here too:
Title: Re: Term does not appear as a suggestion when the term is lowercase in database and capital in translation unit (SDL-Trados Studio)
Post by: spiros on 12 Dec, 2010, 08:47:12
Apparently, it is a feature!

Why is AutoSuggest not case sensitive?

You have created an AutoSuggest dictionary to speed up your translation work. However, when a word is at the beginning of a sentence, it would be useful if the capitalization is automatically applied. If your AutoSuggest dictionary contains an entry that starts with a capital, it would also be useful if the AutoSuggest feature automatically removes the initial capitalization when the word appears in the middle of a sentence.
For example, your AutoSuggest dictionary contains the entry Ordinateur=Computer. You need to translate the French sentence Mon ordinateur est nouveau into English. It would be useful if you did not have to manually change the capitalized word Computer into computer.

Although this looks like a simple enhancement, the problem is that the tool cannot know if and where a target language actually uses uppercase or lowercase letters.

More abstract:

If you have an input phrase [ S ] with a known translation [T] in the AutoSuggest dictionary, we do not know whether [UPPER(S)] needs to map to [UPPER(T)] or to [T] itself.
Next, if [T] consists of several words, like [T1 T2 T3], we do not know which of them, if any, needs to be converted to uppercase. For example, [UPPER(T1) T2 T3], or [UPPER(T1) UPPER(T2) UPPER(T3)].
Further, whether or not, and how to correctly convert to upper-case depends on where in the target segment you want to insert the suggested target phrase; case conversion depends on whether you are at the beginning of the sentence or not.
A “scientific solution” could be to treat phrases which differ in case separately in the AutoSuggest dictionary and have multiple entries for [S, T] and [UPPER(S), all variants of T]. That will however increase the search space during phrase extraction and will most likely lead to a reduced recall in the AutoSuggest dictionaries. As a result you get fewer useful suggestions.

Another “scientific solution” could be to come up with a target language model that fixes casing depending on context. That would be very useful and quite an interesting research topic for SDL, but at this point in time it is not considered in scope with the company's core business.

An alternative approach could be, depending on some user setting, to always suggest an upper-cased variant or lower-cased variant in addition to the “raw extracted phrase” in the AutoSuggest dictionary. For example [pointless test, {Nutzloser Test, nutzloser Test}]. The translator can then select the preferred case variant. However, this will lead to quite a number of additional phrase suggestions which would only differ in case. Since the number of suggestions to be displayed is limited and no suggestions are displayed at all, or a longer target prefix has to be typed in, if there are too many candidates, this may have significant usability and recall drawbacks as well.

To conclude, it is a great and simple idea but unfortunately also one with potentially complex ramifications. As it is usually quite frustrating to correct mistakes made by a system that attempts to be “over ambitiously intelligent”, it is often simpler and faster to just go ahead and fix the casing yourself, especially when there are keyboard shortcuts available.