Author Topic: Caterpillar - Export the translatable strings of html files to one file, tag and translate  (Read 9875 times)

spiros

  • Administrator
  • Hero Member
  • *****
  • Posts: 437281
  • Gender: Male
  • point d’amour
    • spiros.doikas
    • greektranslator
    • doikas
    • 102094522373850556729
    • lavagraph
    • Greek translator CV
Caterpillar - Export the translatable strings of html files to one file, tag and translate
http://www.stormdance.net/software/caterpillar/overview.htm

Caterpillar is a high-speed HTML Text Extractor and Integrator written for translators working with web sites. Process whole folders of web pages with a single click, then translate using your choice of software.

By generating a single output file containing all the text requiring translation Caterpillar provides a simple way to incorporate web page localisation into your existing translation work flow.

Now you can translate web sites in the familiar environment of MS Word.

Caterpillar is free for processing up to 8 files. Free to try £24.99 to buy

It produces a txt file, but how do you tag it in order to translate it with CAT tools?

Paste your txt file in a Word document. Select some non-translatable text and paste formatting (Ctrl+Shift+V) once you copied the formatting from text with external style (Follow the link and download the zipped word file which contains the style).

Here is a macro for tagging the file:

Code: [Select]
Sub caterpillar()
'
' caterpillar Macro
' Macro recorded 2007/02/21 by Test User
'
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^p"
.Replacement.Text = "^l"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.Style = ActiveDocument.Styles("tw4winExternal")
With Selection.Find
.Text = "(^lID=)(*)(Target=)"
.Replacement.Text = "\1\2\3"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.HomeKey Unit:=wdStory
Selection.MoveDown Unit:=wdLine, Count:=7, Extend:=wdExtend
Selection.Style = ActiveDocument.Styles("tw4winExternal")
Selection.EndKey Unit:=wdStory
Selection.MoveUp Unit:=wdLine, Count:=2, Extend:=wdExtend
Selection.Style = ActiveDocument.Styles("tw4winExternal")
Selection.HomeKey Unit:=wdStory
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^l"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub

Here here is the file created by Caterpillar when exporting html content:

Code: [Select]
ROOT=C:\Documents and Settings\WinXP\My Documents\Mydocs\html-examples-1\
PROJECT=Caterpillar 1.3 [ID:rlrPB625491]
PROJECTNAME=
BEGIN FILE 1=C:\Documents and Settings\WinXP\My
Documents\Mydocs\html-examples-1\\example1.htm
WordCount=167
CharacterSet=iso-8859-1
ID=0
Type=description
Source=How a television screen produces a picture
Target=How a television screen produces a picture
ID=1
Type=keywords
Source=television, screen, tube, electron beam, electricity, light,
primary colours, magnetism, electrons
Target=television, screen, tube, electron beam, electricity, light,
primary colours, magnetism, electrons
ID=2
Type=text
Source=Television
Target=Τηλεόραση
ID=3
Type=text
Source=TELEVISION SCREENS
Target=ΟΘΟΝΕΣ
ΤΗΛΕΟΡΑΣΗΣ
ID=4
Type=text
Source=The inside front of a television tube is coated with chemicals
that react by emitting a spot of light when hit by a beam of electron.
Target=The inside front of a television tube is coated with chemicals
that react by emitting a spot of light when hit by a beam of electron.
ID=5
Type=text
Source=COLOUR IMAGE
Target=COLOUR IMAGE
ID=6
Type=img
Source=RGB light diagram
Target=RGB light diagram
ID=7
Type=text
Source=To produce a colour image a television tube surface is coated
in thousands of finely placed groups of chemical spots. Each group
contains three spots of different chemicals designed to emit
Target=To produce a colour image a television tube surface is coated
in thousands of finely placed groups of chemical spots. Each group
contains three spots of different chemicals designed to emit
ID=8
Type=text
Source=red
Target=red
ID=9
Type=text
Source=green
Target=green
ID=10
Type=text
Source=and
Target=and
ID=11
Type=text
Source=blue
Target=blue
ID=12
Type=text
Source=light. There are three separate electron guns at the rear of
the tube. As red, green and blue are the
Target=light. There are three separate electron guns at the rear of
the tube. As red, green and blue are the
ID=13
Type=text
Source=primary colours of light
Target=primary colours of light
ID=14
Type=text
Source=, varying the intensity of the three beams striking the
chemicals on the screen can produce any colour.
Target=, varying the intensity of the three beams striking the
chemicals on the screen can produce any colour.
ID=15
Type=text
Source=Request more information on
Target=Request more information on
ID=16
Type=text
Source=Choose one...
Target=Choose one...
ID=17
Type=text
Source=Electricity
Target=Electricity
ID=18
Type=text
Source=Magnetism
Target=Magnetism
ID=19
Type=text
Source=Light
Target=Light
ID=20
Type=text
Source=Name
Target=Name
ID=21
Type=text
Source=Email
Target=Email
ID=22
Type=submit
Source=Send request
Target=Send request
ID=23
Type=reset
Source=Clear
Target=Clear
ID=24
Type=text
Source=----- Note -----
Target=----- Note -----
ID=25
Type=text
Source=Electrons passing through magnetic field lines of force
Target=Electrons passing through magnetic field lines of force
ID=26
Type=text
Source=are deflected from their original path by the field.
Target=are deflected from their original path by the field.
ID=27
Type=text
Source=Back to top
Target=Back to top
ID=28
Type=text
Source=Example web page ©2000
Target=Example web page ©2000
END

BEGIN FILE 2=C:\Documents and Settings\WinXP\My
Documents\Mydocs\html-examples-1\\example2.htm
WordCount=154
CharacterSet=iso-8859-1
ID=0
Type=description
Source=How different colours of light mix
Target=How different colours of light mix
ID=1
Type=keywords
Source=light, red, green, blue, white light, visible spectrum, primary
colours, light intensity, rainbow, raindrop
Target=light, red, green, blue, white light, visible spectrum, primary
colours, light intensity, rainbow, raindrop
ID=2
Type=text
Source=Primary colours
Target=Κύρια χρώματα
ID=3
Type=text
Source=PRIMARY COLOURS OF LIGHT
Target=PRIMARY COLOURS OF LIGHT
ID=4
Type=text
Source=Red
Target=Red
ID=5
Type=text
Source=green
Target=green
ID=6
Type=text
Source=and
Target=and
ID=7
Type=text
Source=blue
Target=blue
ID=8
Type=text
Source=are the
Target=are the
ID=9
Type=text
Source=primary colours
Target=primary colours
ID=10
Type=text
Source=of light. By mixing proportions of these colours together,
every colour in the visible spectrum can be created. The table below
shows some examples of this:
Target=of light. By mixing proportions of these colours together,
every colour in the visible spectrum can be created. The table below
shows some examples of this:
ID=11
Type=text
Source=MIXING RED GREEN AND BLUE LIGHT
Target=MIXING RED GREEN AND BLUE LIGHT
ID=12
Type=img
Source=image showing magenta light
Target=image showing magenta light
ID=13
Type=text
Source=Mixing
Target=Mixing
ID=14
Type=text
Source=red
Target=red
ID=15
Type=text
Source=and
Target=and
ID=16
Type=text
Source=blue
Target=blue
ID=17
Type=text
Source=light forms
Target=light forms
ID=18
Type=text
Source=magenta
Target=magenta
ID=19
Type=img
Source=image showing yellow light
Target=image showing yellow light
ID=20
Type=text
Source=Mixing
Target=Mixing
ID=21
Type=text
Source=red
Target=red
ID=22
Type=text
Source=and
Target=and
ID=23
Type=text
Source=green
Target=green
ID=24
Type=text
Source=light forms
Target=light forms
ID=25
Type=text
Source=yellow
Target=yellow
ID=26
Type=img
Source=image showing orange light
Target=image showing orange light
ID=27
Type=text
Source=Reducing the amount of
Target=Reducing the amount of
ID=28
Type=text
Source=green
Target=green
ID=29
Type=text
Source=results in
Target=results in
ID=30
Type=text
Source=orange
Target=orange
ID=31
Type=img
Source=image showing white light
Target=image showing white light
ID=32
Type=text
Source=Mixing equal proportions of
Target=Mixing equal proportions of
ID=33
Type=text
Source=red
Target=red
ID=34
Type=text
Source=green
Target=green
ID=35
Type=text
Source=and
Target=and
ID=36
Type=text
Source=blue
Target=blue
ID=37
Type=text
Source=light produces
Target=light produces
ID=38
Type=text
Source=white
Target=white
ID=39
Type=text
Source=light
Target=light
ID=40
Type=text
Source=Working the other way round, white light from the sun passing
through a falling raindrop is split into it's composite colours of the
spectrum forming a rainbow.
Target=Working the other way round, white light from the sun passing
through a falling raindrop is split into it's composite colours of the
spectrum forming a rainbow.
ID=41
Type=text
Source=Return to:
Target=Return to:
ID=42
Type=text
Source=How a television screen works
Target=How a television screen works
ID=43
Type=text
Source=Example web page ©2000
Target=Example web page ©2000
END
« Last Edit: 21 Feb, 2007, 22:45:40 by spiros »