Translation - Μετάφραση

General => Announcements => Topic started by: spiros on 19 Jan, 2015, 14:12:02

Title: Convert Excel files (xls, xlsx) to Multiterm xml (free online tool)
Post by: spiros on 19 Jan, 2015, 14:12:02
Convert online Excel files (xls, xlsx) to Multiterm xml

Convert Excel files to Multiterm xml (https://translatum.gr/cgi-bin/excel-to-multiterm-xml.pl)

xlsx/xls/tsv > MultiTerm xml is part of a series CAT conversion tools developed by Translatum:

— tmx > text (https://www.translatum.gr/cgi-bin/tmx-to-text.pl) (read Help (https://www.translatum.gr/forum/index.php?topic=388540.0))
— xlsx/xls/tsv > tmx (https://translatum.gr/cgi-bin/excel-to-tmx.pl) (read Help (https://www.translatum.gr/forum/index.php?topic=388541.0))
— xlsx/xls/tsv > MultiTerm xml (https://translatum.gr/cgi-bin/excel-to-multiterm-xml.pl) (read Help (https://www.translatum.gr/forum/index.php?topic=388542.0)).

This tool converts the following types of files to a MultiTerm xml which you can import to a MultiTerm termbase. It accepts as input the following types of files:
— xlsx file
— xls file
— tab separated text file (it must be in UTF-8 format)

You can prepare your files for this conversion by following some specific steps.

Make sure only plain LATIN characters (without accents) are contained in the file name. [Important!]
Make sure there is no header row. If there is one, then delete it. A header row could contain descriptive information about the column content, i.e. language name.
Make sure there are no leading, trailing or multiple spaces (You can use the free ASAP Utilities (http://www.asap-utilities.com) to fix that in Excel by going to ASAP Utilities > Text > Delete leading, trailing and excessive spaces (http://asap-utilities.com/asap-excel-tutorial-video-tutorial.php?video=10&title=How+to+remove+leading%2C+trailing+and+excessive+spaces+in+Excel)).
Make sure there are no empty rows (You can use the free ASAP Utilities to fix that in Excel by going to ASAP Utilities > Columns and rows > Delete all empty rows).
Make sure there are no empty cells (i.e. cells where source or target text is missing). To fix that, Sort text first on Column A, scroll down to check on any orphan entries, and then repeat with Column B.

A typical file structure is with source language in Column A and target language in Column B (in Sheet 1).
If your file contains synonyms, then you can separate them with a pipe symbol within their cells. I.e. you could have the entry

Column 1                    Column 2
car|automobile|auto     αυτοκίνητο|αμάξι

Please note that no space is required before or after the pipe symbol. This will ensure that synonyms are properly handled in MultiTerm.

If you used something like:

Column 1                          Column 2
car, automobile, auto   αυτοκίνητο, αμάξι

The result would be for the words automobile, auto and αμάξι not to be recognized by MultiTerm while you are translating. Also, for the first ones that would be recognized, the autocomplete suggestion would insert the full list of synonyms, and then you would have to delete one of them. If it had been handled correctly (i.e. separated with pipes), then in the autocomplete you could access any of the synonyms just by entering the first letter.

In the Enter Column 1 Language check box enter the column 1 language, i.e. English and in the Code box enter the language code, i.e. EN or EN-US, etc. Please see a list of these codes (http://msdn.microsoft.com/en-us/goglobal/bb896001.aspx) (you need the code listed in the Culture Name column). Make sure that the codes match the ones used by your termbase if you want to import that file to a specific termbase (You can check the codes used by your termbase by exporting it in xml format and then opening the file with a text editor like Notepad++ (http://notepad-plus-plus.org/download)). If they do not match, you may need to run a find/replace to fix the language code. In the Enter Column 2 Language check box enter the column 2 language, i.e. French and in the Code box enter the language code, i.e. FR or FR-FR.

In order to create an xml file that is compatible with the codes used by default when creating a MultiTerm terbmase (checked with MultiTerm 2014) go for the shortest code choices. I.e. just English and EN for English source (Column 1 language) and French and FR for French target (Column 2 language).

Maximum file size limit is 10 MB. In case your file is larger than that you can a) split your file into smaller ones and b) if your file has an xls extension, then save it as xlsx (the file size is reduced in xlsx format).

Files are deleted hourly and they are not used for any purpose.

The following video illustrates the conversion from Excel to multiterm xml, the creation of an SDL-Multiterm termbase and the import of the xml file into that termbase.

Convert Excel file online to import in SDL Multiterm - YouTube (https://youtu.be/OvVtoJnWPWQ)