March 31, 2004 Edition

By Adam Skutt (mailto:askutt@wnec.edu), Stephan Windischmann (mailto:windi@arslinux.com), Amit Gurdasani (mailto:amit@arslinux.com), Joe Sweeney (mailto:joe@hopelost.net)

 

Introduction

We're back. Did you miss us? You shouldn't in the future, as we strongly believe we have the infrastructure in place to dish up fresh servings on a weekly basis.

This week, Linux.Ars looks at internationalization and localization of the Linux desktop, something at which the system shines, as well as Ghost for Unix, a portable hard drive imaging program. Additionally, everyone's favorite retailer is getting even deeper into the Linux game.

 

Intrusion on www.gnome.org

Several of the GNOME Project (http://www.gnome.org/) servers were compromised last week, leaving various services unavailable. All critical GNOME web sites and the main FTP archive are running again; only minor sites, such as art.gnome.org, still remain unavailable. As a result of this, the release of GNOME 2.6 was delayed until today (http://www.gnomedesktop.org/article.php?sid=1713&mode=thread&order=0&thold=1), even though no code has been compromised. The initial discovery of the intrusion is detailed here (http://mail.gnome.org/archives/gnome-announce-list/2004-March/msg00113.html). Updates about the intrusion can be found in this post (http://mail.gnome.org/archives/gnome-hackers/2004-March/msg00019.html) to the gnome-hackers mailing list.

 

Wal-Mart sells more PCs with Linux

The world's largest retailer, Wal-Mart (http://www.walmart.com/), has begun selling Microtel PCs bundled with Sun Microsystems' Java Desktop System (http://wwws.sun.com/software/javadesktopsystem/), Sun's Linux distribution. There are several models available, ranging from US$298 to US$698. The US$398 Microtel SYSWM8003 (http://www.walmart.com/catalog/product.gsp?cat=3951&dept=3944&product_id=2592735&path=0%3A3944%3A3951%3A41937%3A86796%3A132690) comes with an AMD Athlon XP 2400+ processor, 128MB of memory, a CD-ROM drive, a 40GB hard drive and Sun's StarOffice software suite, but no monitor. The US$698 SYSWM8006 (http://www.walmart.com/catalog/product.gsp?cat=3951&dept=3944&product_id=2592739&path=0%3A3944%3A3951%3A41937%3A86796%3A132690) has an Intel P4 processor, 256MB of memory, an 80GB hard drive and a CD-RW/DVD-ROM combination drive. It should be noted that these are not the only Linux PCs that Wal-Mart sells, as it also ships PCs with LindowsOS installed (http://www.walmart.com/catalog/product_listing.gsp?cat=96356&path=0%3A3944%3A3951%3A41937%3A96356) and Lycoris Desktop/LX installed (http://www.walmart.com/catalog/product_listing.gsp?cat=106560&path=0%3A3944%3A3951%3A41937%3A106560). Wal-Mart seems determined to be the lowest-cost PC retailer around, and if they can convince customers that not having Windows XP is no problem, they could be the ones spearheading the adoption of Linux on the desktop.

 

TTT: Tools, Tips and Tweaks Internationalization and localization, or how to write badly in many languages

With software and hardware getting cheaper and easier to access, computing is becoming increasingly international in scope, with an increasing demand for the ability to compute in non-English languages and non-Roman scripts. The past few years have seen releases from commercial operating system and productivity software vendors gaining support for input, display and printing compliant to national standards for scores of locales. Fortunately for us, Linux has excellent multilingual support.

Internationalization (i18n, for I–18 letters–N) and localization (l10n, for L–10 letters–N) are terms used to describe the typical efforts involved in getting a piece of software to speak different languages.

Internationalization refers to the ability of software to deal with input and output in various locales, so that the software will provide an interface to the user that is capable of handling characters corresponding the language used in the user's locale, and items such as date and time formats, digit grouping, currency units, units of measurement and the like will correspond to the standard uses in the locale.

Localization is a related concept. It refers to the ability of software to provide a user interface in the language specified by the locale. Usually, this is accomplished by translating all the text that the software presents into the languages that the software supports, and depending on the locale, choosing the appropriate translation to present to the user.

A locale usually encompasses the specific dialect of a language used in a region (often a country), occasionally specifying the character set used for the script, which standardizes the representation of the alphabet, numerals, diacritic marks and symbols used in text written in the language.

Increasingly, the character set of choice is Unicode. Certain Unicode-based encodings are more popular (mappings from the machine numerical representation of a character to the textual representation of the character; not necessarily the actual glyph displayed, for glyphs can result from the combination of letters, diacritics and the like), such as UTF-8 (a variable-length encoding whose lower-order code points are similar to the ISO 8859-1 Latin 1 character set used for most Western European languages) and UCS-2 (a 16-bit encoding of a subset of Unicode used pervasively by Windows NT and derivatives). In Linux, the most popular Unicode encoding is UTF-8. Other encodings tend to be popular in certain locales; for instance, in the US and many Western European nations, ISO 8859-1 (Latin 1) and ISO 8859-15 (Latin 9) are popular; in Taiwan and China, the Big5 and GB2312 encodings are widely used; and in Japan, the EUC-JP and Shift-JIS encoding are frequently used. The reasons for using non-Unicode character sets are varied; for instance, the national encodings may be richer than the Unicode representation of the script, or the use of the character set may be deeply entrenched.

In Linux, there is no standardized method for developers to internationalize or localize their applications; the method used depends on the user interface chosen, licensing, etc. of the software. For instance, frequently, GTK+ and GNOME applications use the GNU gettext library (LGPL-licensed), which is a convenient framework for incorporating and maintaining translations of the text used in the application into various languages, and the Pango library (LGPL-licensed) in order to lay out text in the Unicode character set. Applications using the Qt widget toolkit can use Qt's built-in means for dealing with translations, or (in the case of applications using the KDE framework) can use gettext. Applications such as MULE for XEmacs have their own mechanisms for internationalization and localization. Conversions between encodings can be accomplished by the use of the iconv library (LGPL-licensed).

However, as far as end users are concerned, things are much simpler. On a system-wide scale, the locale can be set by fiddling with a number of environment variables in the configuration file of your favorite shell (e.g. /etc/bash.bashrc for bash, /etc/csh.cshrc for csh and tcsh, /etc/zshenv for zsh, /etc/profile for sh, ksh and pdksh, and so on) and in configuration files for various components, such as /etc/gdm/gdm.conf for the GNOME Display Manager (the graphical login on GNOME systems). On a per-user scale, these settings can be made in your shell's configuration file, e.g. .bashrc for bash, .cshrc for csh/tcsh, .zshrc for zsh, and so on. If you log in graphically, it might also help to set it in your .xsession (graphical login script) if you have one. There are a number of available knobs to turn:

Usually, just setting the LANG (for many applications) and LANGUAGE (for software such as GNOME) is sufficient.

# My language is Spanish as is written in the U.S., using the Unicode
# character set, in the UTF-8 encoding.
LANG=es_US.UTF-8
LANGUAGE=es_US.UTF-8
export LANG LANGUAGE

The locales you intend to use must be generated first; to do this, you edit /etc/locale.gen and run the locale-gen utility. Here's a sample /etc/locale.gen:

en_USISO-8859-1
en_US.UTF-8UTF-8
es_US.UTF-8UTF-8

Running locale-gen results in this output:

root@athena:~# /usr/sbin/locale-gen
Generating locales...
  en_US.ISO-8859-1... done
  en_US.UTF-8... done
  es_US.UTF-8... done
Generation complete.

In order to configure the system for text input in a certain language using a particular keyboard layout, it is possible to use the XKB framework with XFree86 via the X Keyboard extension. To do this, you can edit your XF86Config or XF86Config-4 file, usually found in /etc/X11 or /usr/X11R6/lib/X11. Alternately, you can use the setxkbmap tool.

There are various XKB settings that can be set:

There are other settings; for more information, see the XFree86.org documentation (http://www.xfree86.org/current/XKB-Config.pdf) on XKB. The available choices for these and other settings can be found in the file /usr/X11R6/lib/X11/xkb/xfree86.lst.

The configuration looks like this:

Section "InputDevice"
Identifier "Keyboard1"
Driver "Keyboard"
 
# We want the US keyboard layout with an optional Arabic
# keyboard layout. (You can specify multiple layouts -- up
# to four -- only with XFree86 4.3.0 or later.)
Option"XkbLayout""us,ar"
 
# 104-key PC keyboard with the right-hand Windows Logo key
# mapped to the Compose key to combine letters and accents.
Option"XkbModel""pc104compose"
 
# We want to use the Alt-Shift key combination
# to switch languages. We also want to swap the left-hand
# Ctrl and Caps Lock keys.
Option"XkbOptions""grp:alt_shift_toggle+ctrl:caps_ac"
EndSection

You can try out the settings in the current session using the setxkbmap utility.

setxkbmap -layout us,ar -model pc104compose -option grp:alt_shift_toggle+ctrl:caps_ac

Users of languages where it isn't easy to use a keyboard layout for text entry (especially Chinese, Japanese and Korean) can frequently use input method editors using the XIM (X Input Method) API. There is a good HOWTO (http://www.suse.de/~mfabian/suse-cjk/) on this topic.

Of course, in order to be able to view text in a particular language, you need a font that provides glyphs for that language in the character set of your choice. One of the most easily obtainable Unicode fonts that has support for several scripts is the GNU Freefont collection (http://savannah.nongnu.org/download/freefont/). Another set of fonts that carry most Latin glyphs as well as scripts such as Arabic, Hebrew and Cyrillic are Microsoft's core fonts (http://corefonts.sourceforge.net/) for the web. There are various web sites dedicated to information about fonts available (http://www.alanwood.net/unicode/fonts.html) for many languages in different encodings.

Putting all of this together, it is possible to have a desktop environment in one's native language (even if that language isn't English) by making a few settings. For instance, the following screenshot shows a recent GNOME 2.5 snapshot (mostly) in the Hindi language (locale hi_IN.UTF-8, XKB layout dev):

Missing image
Hindi-gnome.png
Description

You might notice that the quality and extent of the translation varies from software to software and translator to translator. Localization is a painstaking procedure, and not all translations are alike in quality and availability.

If you'd like to participate in localization efforts for your language, several open-source software projects have internationalization and localization projects that could use your help. KDE (http://i18n.kde.org/), GNOME (http://developer.gnome.org/projects/gtp/) and OpenOffice.org (http://l10n.openoffice.org/) all have localization projects; there is also the Free Software Translation Project (http://www2.iro.umontreal.ca/~gnutra/po/HTML/).

Modifying or designing software to allow for internationalization is beyond the scope of this write-up. However, there are several (http://graal.ens-lyon.fr/~mquinson/l10n.html) good (http://handhelds.org/~zecke/apidocs/qt/unicode.html) resources (http://mail.gnome.org/mailman/listinfo/gtk-i18n-list) available.

A note: While Mozilla the web browser has excellent multilingual support, it tends to fall down a bit on displaying complex scripts such as Thai and several Indic scripts, such as Devanagari. There is a Bugzilla report filed (http://bugzilla.mozilla.org/show_bug.cgi?id=215219) and a patch in the works, with a patched binary (ftp://ftp.mozilla.org/pub/mozilla.org/mozilla/releases/mozilla1.6/contrib/mozilla-i686-pc-linux-gnu-gtk2-pango.tar.gz) of Mozilla available for download. This patched binary works well for the complex scripts, but falls down on right-to-left support, which the regular unpatched Gecko engine gets right. We hope that these issues will be resolved soon.

 

GNOME tweaks

GNOME tweaks

While many folks are content with the way their GNOME desktop works, others among us are tweakers at heart, and are never content with what's served to us. Others find some components of the desktop (e.g. the Metacity window manager, which is a bit anemic in features, especially compared to the likes of Sawfish and Enlightenment, which enable the user to do things such as match windows dynamically based on their X11 window class and set special properties) unsatisfactory. We've got a few GConf settings (set with the gconf-editor tool or the gconftool-2 tool) to help you out.

 

Cool App of the Week

Ghost for Unix: Portable Hard Drive Imaging for Unix

Missing image
G4u-welcome.gif
Description

Ghost for Unix, or g4u (http://www.feyrer.de/g4u/), is a NetBSD-based bootable floppy or CD-ROM that allows you to clone your hard disks for backup or to do mirrored installations. The floppy and CD offer two uses. One function is to upload a compressed image of the local hard disk to a FTP server. The other function is to restore that image via FTP, uncompress it and write it back to the disk using the network configuration obtained via DHCP.

With the hard disk being compressed as an image, any filesystem and Linux distribution can be used with g4u. Backups of entire local disks as well as individual partitions are also supported. Since g4u reads the disk bit by bit, starting with the first bit to the last, it includes the MBR, partition table and the partitions themselves. g4u works with both IDE and SCSI drives of any size and geometry. The default compression type is GZIP Deflate at level 9, but can be changed from compression level 1 (fast, little compression) up to 9 (slow, maximum compression). Requirements include an empty 1.44MB floppy disk or an empty CD, an FTP server with sufficient free space for the hard drive images and a DHCP server. You can download the floppy image here (http://www.feyrer.de/g4u/g4u-1.14.fs) and the CD-ROM image here (http://www.feyrer.de/g4u/g4u-1.14.iso).

 

/dev/random