Machine Translation Enhanced Computer Assisted Translation (MateCat)

spiros · 4 · 3627

spiros

  • Administrator
  • Hero Member
  • *****
    • Posts: 812079
    • Gender:Male
  • point d’amour
MateCat is still in alpha (not available to the public) but it looks most promising as it will provide methods for the automatic self-correction of Machine Translation (which is used in a translation memory environment) by means of implicit user feedback. This is indeed cutting edge.

Machine Translation Enhanced Computer Assisted Translation (MateCat)

MateCat pushes what is considered the new frontier of Computer Assisted Translation (CAT) technology, that is, how to effectively and ergonomically integrate Machine Translation (MT)  within the human translation workflow.

While today MT is mainly trained with the objective of creating the most comprehensible output, in MateCat we target MT technology that will minimize the translator’s post-edit effort.

To this end, MateCat is developing an enhanced web-based CAT tool that will offer new MT capabilities, such as automatic adaption to the translated content, online learning from user corrections, and automatic quality estimation.

The project builds on  state-of-the-art  MT and CAT technologies created by the project members, such as  Moses, the most popular open source statistical MT toolkit, and MyMemory, the world’s largest Translation Memory (TM) built collaboratively via MT and human contributions.

Our ultimate goal is to create new CAT technology  that will significantly enhance the productivity and user experience of professional translators. Hence, progress of MateCat will be systematically assessed through field tests,  involving professional translators, working on real translation projects, and evaluating the utility and usability of our solutions.

State-of-the-art statistical MT employed in more or less interactive settings generally lacks dynamic adaptation capabilities that allow it to learn from the user’s feedback. On the other hand, a very natural desire of a human translator using MT in aCATtool would be to see a consistent use of terminology and style that is similar to his/her own throughout the text, and that once he/she corrects an error this should not occur again in the following text segments. In addition, such adaptations should happen in real time.

On-line learning
MateCat will provide methods for the automatic self-correction of MT making use of the implicit feedback of the user. The segments of text that have already been post-edited by the user will be analysed and compared with the corresponding automatic translations by the MT in order to spot the errors together with their corrections and the portions accepted by the translator. The MT models will be modified accordingly by penalizing the former and reinforcing the latter, or, more drastically, by removing the source of errors. Although ad-hoc transformations could be similar to those for the project adaptation (see above), the goal here is to make them very precise and consistent with the actual translator. Through this on-line adaptation, which is performed in real-time and sentence by sentence, MT should automatically translate the following segments more and more consistently with respect to the previous ones from the point of view of the translator’s lexical and stylistic preferences.

Context-aware translation
MateCat will also focus on providing suggestions by MT which are consistent with respect not only to the already edited segments but also to the whole document. This context information will be embedded in the statistical models and will enable better disambiguation, for instance, between lexical alternatives. The context-based models will combine information about recurring terms and expressions extracted during the document analysis with the corresponding chosen and confirmed translations as soon as they become available. In particular, translation constraints related to inter-sentence and intra-sentence anaphoric expressions, to syntactic concordances, and to lexical coherence will be taken into account by means of specific statistical models.

Real-time processing
The core components of traditional MT systems, that is, the translation and the language models, are generally static: they never change after an initial training phase. This means that they are unsuitable for a dynamic environment like the one that MateCAT is designing for translators. In order to model the dynamic changes depicted in the two previous tasks, MateCat will develop innovative data-structures that can be rapidly and effectively updated as soon as a new translation is supplied by the user, and innovative, efficient algorithms for performing this adaptation in such a way that the whole process takes place in real time and is transparent to the translator. Moreover, efficiency will be improved by taking advantage of single CPU multithreading, as well as distributed computing facilities running on private clusters or computer clouds.
« Last Edit: 07 Oct, 2013, 21:46:57 by spiros »


spiros

  • Administrator
  • Hero Member
  • *****
    • Posts: 812079
    • Gender:Male
  • point d’amour
And some more news for you: MateCat is now free for all.

As you may already know, we've been working hard to develop a free and open source alternative to Trados. We switched all of our production to MateCat this year and we are happy to announce that you can now use it for all your projects, not only for the ones we send you.

Few goodies
— MateCat works on Mac and Linux, not only Windows and does not require any installation.
— It’s completely free and includes free suggestions from Google Translate and Microsoft Translator, without any limitations.
— MateCat allows you to create Private TMs to archive all the work you do for your customers without sharing the TM with them. Simplicity of the cloud, protection of the desktop.
You can translate your projects here: Translate a file with Matecat

No secret about the plan. To keep MateCat free, we are offering other LSPs the opportunity to outsource to us languages that they do not cover. We hope you will like MateCat and help us promoting it to your LSPs.


Translating seems pretty straightforward using these shortcuts:
Ctrl+Enter to translate and move to next TU.
Ctrl+Shift+Enter to translate and move to next untranslated TU.
ALT+CTRL+I to copy source
ALT+C to search the concordance

Double click to insert glossary terms to your translation (no shortcut seems to exist yet!).
You can add a term from the Glossary panel to your translation simply by double clicking on it. It will be added to the target segment right after your cursor.

Some info on the translating process:
Translation Process – Matecat

Manual
https://www.matecat.com/wp-content/uploads/2014/12/MateCat-User-Manual-and-Installation-Guide_v1.4.pdf

Help
Support Archive – Matecat

To select and enter one of the suggestions into the translation input field you can:
Use the shortcut ALT+CTRL+[n] for Windows or ALT+CMD+[n] for Mac OS X where [n] could be 1, 2 or 3, depending on whether you would like to add respectively the first, second or third suggestion.
double-click on the match you would like to select
Using TM matches, MT suggestions and Glossary terms – Matecat

« Last Edit: 25 Mar, 2015, 20:27:54 by spiros »



spiros

  • Administrator
  • Hero Member
  • *****
    • Posts: 812079
    • Gender:Male
  • point d’amour
With the new release of MateCat now you can:

— Add your custom MT engines to MateCat when creating a project;
— Localize Android apps with MateCat;
— Get an objective measurement of the language quality for your translations;
— Import, export and delete translation memories.

As usual, you don't need to do anything to update the CAT. Just log into MateCat and use the new version.


Some questions I sent to the MateCat team months ago (no reply yet):

Hi,

I see no way to import a glossary (I have created a private TM).

Also,

1) you mention that csv is supported, but which exact csv flavour (i.e. delimiters)? Do you have a sample file?

2) I see no easy way of entering a term while translating, i.e. autocomplete or shortcut, and one needs to manually move to the Glossary tab to see the translation of the term.
3) Although I have added a personal TM, I still see "Add your personal TM" at the bottom of each segment.
4) I add a glossary term by clicking in Glossary tab and adding source and target and pressing the strange icon on the right, but when I get back to that segment, the term is not displayed in the Glossary tab.
« Last Edit: 25 Mar, 2015, 20:30:04 by spiros »


spiros

  • Administrator
  • Hero Member
  • *****
    • Posts: 812079
    • Gender:Male
  • point d’amour
Easily extract all the translatable contents from any file format into a convenient XLIFF file.
Translate it and use Filters again to get back a completely translated file with perfectly preserved formatting.
MateCat Filters

Directly supported formats

Microsoft Office

DOCX
XLSX
PPTX
Open Office

ODT
OTT
ODS
OTS
ODP
OTP
Hypertext

HTML
XHTML
Localization

SDLXLIFF
XLIFF
PO
TTX
Desktop publishing

MIF
IDML
ICML
DITA
Interchange Formats

CSV
TSV
XML
DTD
JSON
YAML
Others

TXT
PROPERTIES
RESX
STRINGS
SRT
WIX
Formats supported using MateCAT Win Converter

MateCAT Win Converter transforms some filetypes in formats directly supported by MateCat Filters, using some external commercial dependencies. It uses Microsoft Office to convert old legacy Office formats to the new Office Open XML, Nuance OCR SDK to convert images in DOCX, and CloudConvert to convert PDF to DOCX. See the dedicated repository for more info.

Microsoft Office

DOC
DOT
DOCM
DOTX
DOTM
XLS
XLT
XLSM
XLTX
XLTM
PPT
PPS
POT
PPTM
PPSX
PPSM
POTX
POTM
OCR

BMP
GIF
PNG
JPEG
TIFF
Scanned PDFs
Others

Regular PDFs
RTF
https://github.com/matecat/MateCat-Filters/wiki/Supported-file-formats



 

Search Tools