Translating Excel files which contain html with Trados Studio

Translating Excel files which contain html with Trados Studio

An excellent article here: Handling taggy Excel files in Studio where this rule is added for the xls or xlsx filetype:

Start tag <[a-z][a-z0-9]*[^<>]*>
End tag </[a-z][a-z0-9]*[^<>]*>

The problem with the above method is that, in case you use machine translation in Studio, the machine translation output will not respect the tags, i.e. the tags will need to be re-entered.

My method remains unaffected by the above problem. It involves pasting in Word and using Trados styles and find/replace operations with regexes (or using a tool like +Tools by opening the file, going to Tweak > Create Tw4winStyles > Run on selected files in order to tag html files) and then pasting back into Excel once translated. This method works fine, however, in case there are soft breaks in Excel files, when pasting back to Excel they are converted into paragraphs, hence new cells, creating problems when you have source and target columns and cells.

A way to resolve this is to replace the manual line break in Word with a string of characters that does not occur in the text, for example:

Find/Replace in Word
Find ^l
Replace xwg

— Translate, clean up
— Replace line breaks with xwg as described above
— Paste to Excel.
— Then, select the range of translation cells.
— Press Alt-F11 to get into the VB editor
— Press Ctrl-G to get to the immediate window
— Paste the following: selection.replace "xwg", chr(10)
— Hit Enter to run it (applies to Excel 2010 or higher).

Alternatively, run this macro (change the commas in the macro to semicolons in case your locale is Greek).

Code: [Select]
Sub Replacewithlinebreaks()
For Each c In ActiveCell.CurrentRegion.Cells
c.Value = Application.WorksheetFunction.Substitute(c, "xwg", Chr(10))
End Sub

Another option is to download ASAP Utilities

Go to Range » Find and/or replace in all sheets...
and enter xwg to search for, and {lf} for the replacement

Also, there may be instances of tabs within cells, in this case, the code is Chr(9).
