From: Andrew Ferrier <andrew junk new-destiny co uk>
To: dia-list gnome org
Subject: Re: UTF-8 on stdout?
Date: Sun, 7 Jul 2002 10:43:15 +0100 (BST)
On 2002-07-06 at 21:42 -0400, James K.Lowden wrote:
> PMJI. You might want to have a look at http://czyborra.com/
> Mr. Czyborra has a pretty good overview of what's what
> regarding encoding and character sets, and does a good job of
> distinguishing between fonts, glyphs, and characters. You
> may in particular want to look at:
>
> http://czyborra.com/unicode/terminals.html
This certainly seems a pretty good site. I'll have to take a
long look at it sometime but it's answered a few questions
already... thanks for the reference!
> What you bumped into was, as Lars said, a problem with xterm.
> If you push UTF-8 to stdout, it falls to the application
> whose job it is to convert encoded values into glyphs that
> your brain can interpret as characters (I'm skipping a few
> steps). The standard xterm is *not* going to expect UTF-8;
> it will instead interpret the bytestream as ASCII or Latin-1
> or whatever your locale settings indicate.
Yep. However, it does appear from czyborra that there is an
escape sequence to make UTF-8 hacked 4.0 xterms switch into
UTF-8 mode. I'll investigate this and give it a try. Not sure
if it's the kind of thing that Dia should be outputting
however... probably more of a user/system-wide thing.
> dia --credits |sort
>
> how is sort(1) supposed to know what's incoming? It doesn't
> guess; it assumes, and unless the answer is 7-bit ascii, it
> assumes wrong. Its only defense is, it's got a lot of good
> company.
Good point. In this case I'm not going to worry because the
names are not surname, forename anyway (which is conventional
in most locales I think), and there is surrounding bumpf too.
But in a more general case that is very important I guess.
> Interesting place. In particular, the -u8 option for xterm
> does exactly what Andrew wants. We should get Akira and Xing
> Wang to use their utf8 encodings for their names.
Yes, I guess so. I'll continue outputting in UTF-8 then: I'll
assume it's the responsibility of the user to sort out their
terminal if they want 'correct' output.
Cheers for all that guys,
Andrew.
--
Andrew Ferrier
email: andrew.junk@new-destiny.co.uk
web: http://www.new-destiny.co.uk/andrew/