Linearize tmx on a translation unit basis
<tu (it has to include at the beginning all the white space preceding the <tu and one extra space after it)
<tu (one extra space after it)
\n \s+ (regular expressions on)
Replace with nothing
The result would be a list of one liners witch each line having a full TU.
Why do I do that? Because most tmx editors cannot handle big TMs and TM/TMX editors suck when it comes to big files. I use EmEditor instead where for example I can search for invalid/corrupt characters, bookmark those TUs and then batch delete them.