I should mention, I did the figures in the chapter on Message Perspective using only the Unicode character set. Microsoft Internet Explorer doesn't like Unicode. It goofs up the figures, but Mozilla and Firefox work fine. I'll eventually redo them as gifs.
In order for Internet Explorer to work OK with Unicode you have to make sure all your font declarations in your css are for unicode fonts (MS Arial Unicode for example). Firefox can cope with this not being so as it replaces missing characters with glyphs.
Γεια σου Σπιρο,
Καλώς σε βρήκα. Τι κάνεις;
1. This HTML does not use a CSS.
2. The only explicit reference to the character set occurs in the HTML <head>:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
This tag is necessary, because IE does not use Unicode by default. Without it, Windows users must set the Browser encoding by hand, to UTF-8, to see the Greek, and Windows rules the world, so I have to conform.
3. Microsoft Times New Roman is a Unicode font. M-TNR supports all the Unicode characters I'm interested in, including the box-drawing characters. I swiped a copy from Windows and installed in on my Linux box at home just to make sure.
Actually, you can't call a font by its character set name. There is no such thing as a "unicode" font. Fonts support whatever characters they want, usually a international subset of the Unicode. In fact, M-TNR supports all the European languages, including Cyrillic. But it does not support the Asiatic writing systems. You'd need a font designed especially for them.
Anyway, the same font is used by all character set encodings. All fonts, at least the SGV fonts developed by Adobe, of which Microsoft True-Type is a rip-off, map the character set to the glyph set internally.
There is a table inside the font for each character set supported. If the font does not contain a map for your character set, then the font cannot be used with that character set. But Unicode is supported by all the common (if not alsolutely all) fonts. No, that's not true. I've seen speciality fonts which support only their own, small, special-purpose character sets.
Microsoft TNR is also a "Greek" font. In fact, Microsoft TNR even supports polytonic Greek (but not on Windows 98, nor on Xp, apparently). There is no glyph substitution. The polygreek looks pretty as a picture on Linux. (I have fonts on my Linux which do not support polygreek, and you can see the glyph subsititution.)
The only problem seems to be that Internet Explorer is not using the font properly.
How do I know?
Because Firefox on the Windows platform correctly (well, almost correctly) renders all the greek characters, and the Unicode figures too. Both browsers, IE and Firefox are using the same font, the Microsoft TNR, and the same platform, Windows Xp.
To see the difference, compare the following URL, an the entry from Babiniotis dictionary, with both Firefox and Internet Explorer:
συνίσταμαιYou'll see one works better than the other. Mozilla works like a charm on my Linux machine at home. It's perfect.
Joe
PS
Microsoft is, like IBM was, and still is, a marketing company, not a technology company. Their engineering is mediocre, to say the least - limited to "reverse-engineering" - and their attention to detail even worse:
1) Windows still has an incomplete or defective implementation of Unicode, and in particular, utf-8. They just refuse to cooperate.
2) IExplorer has a defective implementation of HTML:
- Can't embed <TABLE> into <A>
- Can't embed <UL> (or any list) into <TABLE>
- I wonder what IE would do if I put a <UL> in a <TABLE> in an <A>, cry?
When things break, you just change your page, because IE is the de-facto standard. That's what I'm going to do with those Unicode figures. I can create GIFs using the Gimp.
3) IE does not correctly use Microsoft's own fonts! (see above) Actually, I don't think they're Microsoft fonts. They've been developed my another company. Microsoft just distributes them.
4) Frontpage breaks working HTML.
I use Frontpage because it is the only "reasonable" utf-8 editor on a Windows platform (I don't consider Word to be "usable", at least not for HTML (see below)), and Notepad, although utf-8, doesn't understand Unix new-lines.
When I do a Search/Replace to finalize my URLs, Frontpage breaks the <ul> inside a </blockquote/> near the bottom of the page:
Welcome to Modern Greek VerbsIt goofs up, but I haven't taken the time to figure out why.
[Hey, Maybe they should call it "Breakpage" instead of "Frontpage".]
So I do the search/replace using Notepad, but since Notepad doesn't recognize the Unix new line character (which is the single-byte ASCII line-feed and has the same value in utf-8, iso 8857-9, and microsoft greek), I can't see what I'm doing: the text is just a blob, a very long, single line.
[Notepad only understands the CR/LF combination. I haven't figured out how to shut that off. All my Linux editors can do both forms of line termination. Maybe I should use the Microsoft CR/LF with UTF-8.]
But at least the Notepad replacement works.
And Word?
Forget it. It's a dinosaur. Sure, it recognizes Unix newline, (Frontpage also understands Unix newline) but Word makes you work with your html as a RTE (rich-text-editor). Besides that, it mangles your page beyond recognition, so bad in fact, I've used it at times to disguise the identity of the author (me).
My guess is Frontpage is either incorrectly analysing the source HTML - which it should not be doing anyway - or, even worse, it may just be broken, from lack of attention.
Unicode was first "supported" in Windows NT, back in 1997. Windows 98 never had a Unicode editor. (W98 "Word Pad" did UTF-16 only, while Notepad on NT was utf-8. Figure that one out...)
Windows XP is better.
XP Notepad currently supports UTF-8, but still puts the BOM (Byte-Order-Mark) in byte positions 1 and 2 of the text file, harmless, but something which was only necessary in UTF-16 - to signal big/little endian byte order of the UTF-16 integers to the recipient - but utf-16 was never real Unicode anyway.
I guess you could call UTF-16 "Microsoft Unicode".
Nobody is perfectFirefox has a bug too, but not in the Unicode or HTML. Πρόκεται να refreshing the page. It forgets where you were, and always restarts at the top of the page. This behavior essentially makes Firefox useless for web developement.
For example, using Firefox visit:
ModalityScroll down and press Refresh. You'll see what I mean.
Curiously enough - because they should be sharing the same code - Mozilla works fine, but it too is broken when you refresh a # (pound-sign) URL. For example, using Mozilla visit:
Causitive VerbsNow, scroll down, then press Refresh. Same bug! Mozilla forgets where you were and puts you back at the start of the #causitive section.
Internet Explorer works correctly in both cases
PPS
Just Some HistoryIBM stuck to EBCDIC until its dieing day. [Well, it's not actually dead. The US Government still uses IBM mainframes to print millions of pay checks, twice a month.] You won't find EBCDIC on your list of browser Encodings though, unless maybe Mozilla. [Well, who would be crazy enough to develope a web page in EBCDIC anyway?]
For all you character set fanatics out there, both ASCII and EBCDIC map the decimal digits to the BCD (Binary Coded Decimal) range. On a S/370 you could actually do arithmetic using the EBCDIC characters without converting them to numbers first. It had a BCD ALU (Arithmetic Logic Unit). That was common way to add numbers in COBOL, which always dealt with money, decimal digits and fixed decimal point. (The S/370 also had a FP-ALU. Fortran needed it. And Fortran sent Apollo to the moon.)
By comparison, the Intel processors have always had packed BCD addition (Motorola never did), but you have to pack the digits first, add them, then apply an ASCII adjust afterwards.
Why would anyone want to do this?
Because money is always represented as a fixed (not floating) point number. Yes, integer addition would work, but since the data is captured as (ascii of ebcdic) characters first, you'd have to convert them to integers first, then back to characters. So why not just add the characters?