A Perl script to convert a text file to Multiterm XML

wings · 4 · 11760

wings

  • Global Moderator
  • Hero Member
  • *****
    • Posts: 73947
    • Gender:Female
  • Vicky Papaprodromou
A Perl script to convert a text file to Multiterm XML
Added: 12 June 2008, last updated: July 17, 2008

I've often read in forums and mailing lists about people having problems with importing a glossary into Multiterm. Most of the time you need a bilingual glossary from which you can insert terms into Word, TagEditor or SDL Edit.

A few years ago I had the same problem and it was then that I developed a Perl script to convert a tab-delimited text file into an XML file compatible with Multiterm's bilingual glossary template. Unfortunately this script ran only inside Notetab Pro, which can execute external scripts and grab their output.

Now I learned how to handle encoding conversions in Perl and I made a standalone version of the script, which can be downloaded here. Actually two scripts, one for English – Polish, and the other for Polish – English glossaries.

Before you use the script, you will have to edit it once to adapt it to your language pair. The script may be made smarter in the future, but for now it does its job well, once you edit the language settings.

When you open the script with a plain text editor (one that can edit/save UTF-8 encoded files), disable wrapping of long lines and go to line 31 which looks like this:


Code: [Select]
<language type="English (United States)" lang="EN-US"/>
Change English (United States) and EN-US to the correct settings for your source language. Go to line 45 of the script and repeat this step for the target language.

You can check the correct language names and codes in step 3 of 5 of Multiterm's Termbase Creation Wizard.

Make sure that you do not delete the quotation marks around the language name and language code, nor the forward slash at the end of the tag.

Save the script. You can use the Save As command and rename the script from MTENUSPL.pl to something that represents your source and target language. Keep the pl extension. It stands for “perl” not “Polish”. ;-)

To run the script, you need:

    * Perl . You can get Perl for Windows as a free download from ActiveState - Code to Cloud: Smarter, Safer, Faster | ActiveState
    * A tab-delimited file with your glossary, saved in the UTF-8 encoding. You can save a tab delimited file with this encoding in Word, OpenOffice Writer, UltraEdit, NoteTab Pro, PSPad etc. I can't make the script guess the encoding of the source file, so it must be in UTF-8. Also, if your tab-delimited file contains any of these characters: <, >, &, replace them as follows:
          o replace < with &lt;
          o replace > with &gt;
          o replace & with &amp;
            If you leave these characters without any changes, they will break the import into Multiterm.

To run the script, copy it where your tab-delimited file is.

Open the command line (Start key+R, type cmd, press Enter). Change the directory to where the script and the tab-delimited file are.

Type the following command:


Code: [Select]
perl MTENUSPL.pl sourcefile.txt
and press Enter. If the source file is “well formed”, the script will process it and create a sourcefile.txt.xml file. This file can be imported into a Multiterm termbase based on the bilingual glossary template.

You can download the two scripts in a zipped file here. For feedback about the scripts please use this form, or contact me through the cat_conv yahoogroup.

When you have successfully created your xml glossary file, you can import it into Multiterm. This short tutorial explains how to do it.

Source: http://syntax.biz.pl/multiterm.html
« Last Edit: 31 May, 2014, 16:39:38 by spiros »
Ο λόγος είναι μεγάλη ανάγκη της ψυχής. (Γιώργος Ιωάννου)


piotrbienkowski

  • Semi-Newbie
  • *
    • Posts: 1
Hi,

I am the author of the original text in this thread. Note that the website address has changed.

For this and other scripts and tutorials, please go to:

http://translationzone.eu/scriptutils.html

Your visits will be most welcome.

Piotr Bienkowski



wings

  • Global Moderator
  • Hero Member
  • *****
    • Posts: 73947
    • Gender:Female
  • Vicky Papaprodromou
Thanks, Piotr!
Ο λόγος είναι μεγάλη ανάγκη της ψυχής. (Γιώργος Ιωάννου)


spiros

  • Administrator
  • Hero Member
  • *****
    • Posts: 854546
    • Gender:Male
  • point d’amour


 

Search Tools