On Mon, 28 May 2001, Cyrille Chepelov wrote:
> Normally, XML files already embed character set encoding information in the
> very first element (<?xml version="1.0" encoding="foo"?>
That sounds like the best way to discriminate between old and new files.
>
> Is (libxml1 8-bit only) and (libxml2 dependent on gtk2) ? If not, we
libxml1 doesn't care about character encodings, so 8-bit characters pass
through without problems (you may run into trouble with some multibyte
charsets though).
Libxml2 doesn't rely on gtk2, but cares about encodings. We already have
conditional support for using libxml2, but it breaks on 8-bit chars. The
reason is that it assumes that the internal encoding used by the app is
UTF-8, so occasionally mangles the second highest bit of some characters.
> probably could move dia's internals to utf8, keeping a charconv_utf8_to_local8 call
> just before render_gdk.c/draw_string() (assuming #54628 is in fact fixed or
> someone understands how to really fix it).
That sounds about right. We should be able to keep charset conversion
just in the file load/save code and the gdk renderer (the conversion code
in the gdk part will go when we switch to gtk 2.0, and should be a no op
on windows already).
>
> However, this will be no small task (basically requires to audit the whole
> code for (gchar *) arithmetic and moving that to the unicode_* functions,
> and define wrappers for these when !HAVE_UNICODE). I'm very motivated to
> tackle this, but I'd like 0.88.1 to not be the new 0.86. I think there has
> been enough problems removed in the CVS head relative to 0.88.1, that making
> a new release (either 0.88.3 or 0.89) before going utf8 actually makes sense.
If you want a new release, we can do one whenever you want. Probably
better to call it 0.89 rather than 0.88.3.
If we are going to have unicode as the default, I am inclined to make it a
required library. The less conditionals, the easier it is to test that a
tarball will build correctly. What do others think about this?
>
> > Getting the locale's charset doesn't look that trivial. There is a
> > function that does this in HEAD glib (g_utf8_get_charset_internal in
> > gutf8.c). Once we have code to get the charset, it is just a matter of
> > adding the appropriate iconv calls in lib/dia_xml.c
>
> Well, you've seen that in fact, it's not that difficult <grin/>. Just use
> the interface from lib/charconv.h, and let that code worry about how it's
> going to be done (I'd really like charconv.c to use the native calls under
> Win32, but I don't remember their names. OTOH, if win32-gtk effectively has
> HAVE_UNICODE defined, it's not a problem. Hans ?)
We may as well use the libunicode calls unconditionally. That way, it
will be a simple sed job to convert over to the glib unicode calls found
in glib-2.0 (which will be in a required library for gtk2, so we may as
well use it :)
James.
--
Email: james@daa.com.au
WWW: http://www.daa.com.au/~james/