Subject: detecting non-convertibility of characters
Date: Tue, 29 Jan 2002 21:41:08 +0100
Le Tue, Jan 29, 2002, à 05:20:41PM +0100, Cyrille Chepelov a écrit:
> Hence my intent to test how to detect that \xc2\xab doesn't translate into
> anything in the current locale encoding, and use the ASCII fallback in that
> case. However, for the locales where \xc2\xab is displayable and if we can
> reliably detect it is indeed displayable, IMO we should use it rather than
> ASCII simulacres.
OK, here are the results:
- test.c is basically a stripped down, hardcoded-to-latin1 version
of charconv.c (it's encoded in utf-8. I hope the test files you sent me
weren't swear words <grin/> They looked definitely Japanese in my emacs21.)
There are four strings: one latin1 (expected to convert), and three which
are not expected to convert into latin1 (for various but obvious reasons).
- test.log is the result of the test, with 2>&1.
As you can see, unicode_iconv() just bails out (and sets errno) when
the string is not convertible.
I'm thinking about adding a try_charconv_utf8_to_local8() function (taking
all code from charconv_utf8_to_local8() until before the test on the result
of unicode_iconv(), and letting it return NULL (but silently !) if the input
string can't be converted to local charset. This should allow to detect
whether the « and » characters are convertible in the current encoding.
Problem: I see there's an alternate implementation of
charconv_utf8_to_local8, which basically delegates to glib1.3. Is this
function silent when presented with "bad" input ? Or is it safe to assume
we're going to either HAVE_ICONV or HAVE_UNICODE even in the glib1.3 case
and use code derived from the older implementation of charconv_utf8_to_local8 ?
Now people are talking of C++0x, I'll probably write to Mr. Sutter so that
the Powers That Be (and Who Talk To The C Comittee) seriously plan of adding
#mess, #beware, #horrible and #hell pre-processor directives.
-- Cyrille
--
Grumpf.
** WARNING **: unicode_iconv(u2l,...) failed, because 'Invalid or incomplete multibyte or wide character'
** WARNING **: unicode_iconv(u2l,...) failed, because 'Invalid or incomplete multibyte or wide character'
** WARNING **: unicode_iconv(u2l,...) failed, because 'Invalid or incomplete multibyte or wide character'
Current charset: ISO-8859-1
test0=pépère
test=
test2=
test3=Mélange hétérogène d'