Hi all,
while I've been advocating the fun side of keeping a bazillion
translations in the same file a few weeks ago, I've now come to a few
conclusions:
1) this won't scale past the half dozen languages we have in sheets.
Certainly not to all the languages we have in PO files.
2) We are currently limited to languages encodable in latin1. We
could decide to either ignore non-latin1 languages, or that sheets are to be
encoded in UTF-8. The first solution is obviously wrong (sorry Hans<grin/>),
and the second is also obviously wrong: over 99% of the text editors available
in the West don't handle UTF-8 out of the box (emacs21 is rumoured to have
that, and current solutions aren't easy and aren't out of the box anyway).
We would then rule out the putative Gaelic translator (whose language
requires a *subset* of ASCII, IIRC) who obviously doesn't need an UTF-8
capable editor. Besides, some non-latin1 countries aren't precisely
impressed with UTF-8 (Japan comes first to my mind; I'm under the impression
that UTF-8 isn't precisely popular in Russia either), so while we will have
to use UTF-8 [***] internally, because it's the Right Way(tm) to do, having
individual language files encoded in the most usual encoding for that
language is a good thing to have.
3) Translators aren't always developers ; there are language teams
who look just at .pot files, and wont't look at anything else (not even a
petty README file). Asking them to handle XML like we've done looks like a
failure (see the *-translation-report files... not pretty.)
For all these reasons, and because I had a very boring Friday with a Python
editor but not my dia tree, I've begun to write a sheet-xgettext program.
What I want is to move sheet translations off the sheet files, and back into
the primary translation infrastructure (I'll try to not loose the
translations we already have :-) )
Does that look good ?
-- Cyrille
[***] Yes, I know that UTF-8 is actually just a payload encoding standard for
multi-byte characters of relatively arbitrary length, and that the meaning
of which sequence of bytes means what character is left to the underlying
encoding. I'll assume for the moment that UTF-8 is just a better way to
store Unicode stuff.
--
Grumpf.