Fun with internationalization, ISO 3166-1, ISO 639 and the CLDR

3:13pm 8th January 2008

So I came to a point in a software project that I am working on where it is time to consider how foreign users will be able to use the system. Yep, it's time for i18n and L10n. Initially, I thought that this would be as easy as simply setting up a translation matrix, and wrapping all static data in a function that performs a lookup on it and outputs the appropriate language.

This worked fine, I set up the matrix using a list of languages maintained in the database according to the ISO 639-1 standard. I wrote a simple yet effective tool for maintaining translation tables and began using online translation services to translate the most simple words and phrases, performing reverse translations on them to ensure they were not contextually mutated by the translator. Most words could be done this way, and in a very short period of time I was able to have substantial portions of the site in other languages including non-latin based languages such as Russian, Greek and Thai. All was well in the world.

Or so I thought. It all hit me at once. A single thought that brought all my cleverness crashing down, and threw up a whole new set of problems to overcome. It was like the opposite of an epiphany. Here's the situation: If I have a list of languages in the ISO 639 list, how do I handle the fact that the same language may actually need two translations? E.g., how would I deal with translating the word "colour" for the American spelling? The ISO code 'EN' does not allow for more than one English language. And then there's Portuguese; it has two regional variants, that spoken in Portugal and that spoken in Brazil. Finally, Chinese. ISO 639 only specified a single language, 'ZH', however there are two written forms, traditional and simplified, which are practically different languages.

It was clear that my solution was just not sufficient. Back to the drawing board.

After pondering this problem for some time, I decided that if I was going to do i18n at all, I was going to do it properly. After all, I've been uncompromising in my support for and awareness of time zone handling. Additionally, I had been careful from the word go about using UTF-8 aware functionality in the code, the database and all ancillary systems. It'd be silly to compromise on proper i18n at this point. That meant using appropriate date formats, spelling variants, weekday/weekend definitions, everything. But how? I had nowhere near the resources to engage in the collation and maintenance of that information. Nonetheless, I was determined that it be done right.

I was given salvation by another IRCer who is also working with this at the moment. He told me about the Common Locale Data Repository, or CLDR for short. It is a large, freely available repository of locale data stored and distributed in a widely used format, an XML DTD called LDML. This DTD is used in many projects for the exchange of locale data, including Microsoft's .NET framework. A relatively new project, having only started in 2003, the CLDR is already by far the most comprehensive repository of locale data that is freely available. It is maintained by the Unicode consortium, so one can be certain that things like format stability, backwards compatibility and data consistency are going to be given due attention.

Implementing full i18n/L10n in my project will be fairly involved, but not difficult. Locale identifiers are made up of ISO 639 codes and ISO 3166-1 alpha 2 codes, of which I already have authoritative sources. Mating them for valid locales is a trivial job, and now instead of my translation table being populated on one axis by language, it now has locale with a default "fallback" language, meaning very little change is required to the infrastructure already written to support language variants from trivial one or two word changes through to wholly different languages with different character sets. Furthermore, it does not seem that CLDR data even needs to be integrated into the DB before being used, the XML file can be stored locally and queried directly, meaning that future versions of the file become a drop-in replacement allowing virtually effortless expansion of locale awareness. The CLDR allows me to trivially do the following:

  • Translate basic data such as month names, day of the week names and names of countries into many languages.
  • Format dates and times according to local conventions.
  • Perform character repertoire tests to guarantee that the fonts used include the all necessary characters to render a language fully.
  • Because the CLDR uses the same time zone list as most posix systems, zone.tab which is part of the zoneinfo database, I can use it to translate time zone names into local languages
  • Determine local currency and its symbol.

In order to serve a user's needs, I will need to get as many of the following details from them in this order:

  1. Locale: As a basic minimum, this will allow the representation of content in their local language and formatting of dates in their expected format.
  2. Location: As users may use locales for places that are not where they are located physically, a user may want to specify their country so that they can be shown appropriate location based settings and defaults.
  3. Timezone: This cannot be reliably inferred from the above two pieces of information, although it can be reasonably accurately guessed. The user will however need the option of changing it, should they wish.

In order to allow full localization, those three pieces of information need to be determined about the user. Much of it can be inferred by using things like IP based geolocation data providers such as IP2Location and MaxMind. The ultimate goal is for a user to just hit the web site and immediately see their language, dates in their local format and times converted to their local time zone. Only time will tell how close I can get to that ideal.

On a side note, I would like to give special mention to PostgreSQL's support for time zone and date math. Being able to perform all date / time related functions at the database makes it trivial to implement zone aware time handling. PostgreSQL is, in my opinion, the RDBMS of choice for applications requiring non-trivial date and time handling or other i18n functionality.

So that's that. I now have all the tools necessary to ensure that this project will end up with a final product that is fully location aware, and allows users to select their locale and have all the relevant alterations to output tailored to their regional expectations. I intend that this project end up being a glowing example of i18n done right. If you have anything to add, or know of anything I may have overlooked, then I invite you to drop me a comment below.

The myth of "economic rationalism"

9:15pm 26th June 2006

Why is it that every time world hunger, poverty or other humanitarian problems are brought up, all solutions offered are couched in terms of "economic rationalism"? The fiction of economic rationalism is counter-productive at best, and abhorrent when applied to matters of conscience. The way to solve human hunger and poverty is not through "economic empowerment programs" funded by the IMF, the unregulated employment of third world labour by first world corporations or donations by public charities. Rather, the rejection of selfish utilitarianism and the re-discovery of compassion are far more likely to yield positive results in the area of humanitarian need. How about paying the third world fairly for the resources they provide? Pay a fair rate for copper mined in Chile, or a fair rate on natural gas from East Timor, or a fair rate on timber milled in Thailand, or a fair rate on labour provided in China. After decades of counter-productive activities, I think it is now clear that economic theory, financial restructuring and nebulous concepts of development are not the answer to any of the world's many ills.

The first world pays deflated prices extorted out of third world countries because desperation is easily exploited. If a man has to choose between being exploited and starving to death, he will choose exploitation. It doesn't mean the exploiter is helping him survive, although that may be the argument used to salve an easily silenced conscience. It simply means that the starving man has no option, and the exploiter is willing to make use of that knowledge to his or her own advantage. To make matters worse, the first world intervenes in the politics of lesser developed nations or overthrows legitimate governments to install corrupt client regimes that will sell their citizens' very soul for a few petty bribes, deliberately exacerbating the problem of already abusive conditions. Many would think this to be conspiracy theorists' ranting or alarmist, anti-establishment propaganda. Perhaps, but I would suggest that one look to examples where organized, governmental efforts are made to create conditions where big business can exploit the rights of the world's people. An example that people in the technology world would be familiar with is the draconian US law, the Digital Millennium Copyright Act (DMCA) and the corresponding government support of DRM, commonly accused of serving no purpose but the maintenance of the artificial monopoly that media production houses have on human creativity. Other examples include the history of Diego Garcia where a community was destroyed to provide a military base, the story of Britain's Opium Wars where a population was "pacified" and forcibly saturated with opium to create a market for a British trading company, the US mining of Nicaraguan ports, the overthrow of the elected Allende government in Chile and the hypocritical partaking in the Apartheid system. The list goes on so long that anyone who believes in the bona fide intentions of first world governments is either utterly misinformed or deliberately self-blinded to the truth.

The real answer to exploitation (and the "terrorist" reaction to it) is the rejection of greed as the motivation for human activity and replacing it with a sense of collective spirit. Markets, while they may be the natural order of things, are dangerous in the absence of communal consciousness. They can work for the good of society, but only if people think on a more mature level than "I want". This idea was put forward in the movie "A Beautiful Mind", where Russell Crowe's character comes up with a new economic theory when he and his friends are in a bar. According to his theory, members of a community need to be aware of the ramifications of their actions on the group, and take them into account when making decisions on how to go about achieving their personal goals. Acting with only their own interests in mind resulted in a negative result for all of them. People should be able to take into account social issues without a constant need for the government to tax them into a pattern of responsibility. There are cases where government intervention is required due to an issue's complexity or scope. Examples would be regulating the amount of fishing in an area or taxing the use of water from a river by local farmers. Such issues are beyond the judgement of individuals and need to be administered from a position of overarching information. Note also, these are not issues of personal morality or conscience. We cannot, and more importantly should not, rely on the government to apply community conscience in the form of taxes on cigarettes or legislation to prevent exploitation of workers. Paying workers a fraction of the real value of their work or deliberately causing others harm for profit should invoke Jiminy Cricket on short order. You know what happens when we let governments act as our consciences? They sell our collective soul, piece by piece. A little piece was sold on the market to the American organizations RIAA and MPAA, the title deed of which reads "DMCA". Another little piece was sold on the market to US defence contractors in a box on which was written "The PATRIOT Act". And then there are the unregulated international markets for insurance, financial services, the media, healthcare and education, markets that turn into feeding frenzies for corporations hungry to bite off chunks of our souls in the form of unreasonable insurance policies, exploitative mortgages, propaganda, disinformation, intellect-destroying "entertainment", socially asphyxiating security policies and the deprivation of medical care and education from all but the super rich. Community values are pieces of a society's soul, and they are being devoured wholesale by the government and corporate neo-noble plutocrats.

The myth that the market can solve moral problems by allowing consumers to "vote with their feet" and choose the most competitive and ethical options on the market is just that; a myth. An example of the way in which leaders arrogantly reject calls for community examination of market failure was in the answer Australian treasurer Peter Costello gave when asked about the possibility of government investigation into constantly rising fees in the Australian retail banking sector. He advised customers to just shop around when faced with unreasonable fees being charged by banks. This ridiculous stance was taken despite the glaringly obvious fact that consumers are unable to bank hop every few weeks or indeed, even every few years, as changing banks incurs massive expenditure of effort and energy. If banks are taking it in turns to hike rates by small amounts, then at any given point in time, consumers cannot reasonably change banks such that the benefit is worth the effort. This enables banks to raise fees, evaluating how much they can raise them by before they exceed the tolerance of their customers. Other banks see this, and raise their rates to match, or further if their marketing department tells them that the increase will not result in significant customer losses. Forget the fact that Australians already pay among the highest bank fees in the world as attested by foreign bank operators. Another example is the early market for broadband Internet access in Australia where customers faced heavy rewiring costs when changing from one provider to another. Consequently, the incumbent local carrier, Telstra, exploited the fact that they were "first off the block" with cable Internet, squeezing their existing customer base long after other companies had arrived with competing products. These are examples where market forces exploit the "hostage audience" phenomenon that occurs when a consumer product's nature places barriers against customers' exercise of choice. I am unaware of any acknowledgement of such "reverse price wars" in traditional economic theory, as it would undermine the principles of fundamentalist marketism currently dominating Western business, politics and economics.

Many would politely refute these ideas as idealist, unrealistic or utopian, or, impolitely deride them as communist. I reject this, and provide examples where ethics and community goals can be achieved and selfish impulses resisted within the ideology of market rationale. The change that is required is not a shift in paradigm from market mentality to some unworkable central administrative system or collectivist authority. Nor is it the complete degeneration into far-left wing anarchy. As with everything in this world, a balance needs to be struck. Market forces can operate effectively for society, provided society is made up of individuals not only concerned with self-gratification, but also social-gratification. To use economic terms, market agents have to act not only to maximize their own utility, but to maximize the total utility of the market as a whole.

The open source software movement is a community of developers who build products as a community. After much trial and error, successful business models have emerged around products like Linux, Apache, MySQL, PostgreSQL and PHP. They were all developed by people who were able to think in terms of community, community progress, meritocracy, and being motivated based on the unselfish desire to see humanity as a whole progress as a result of their efforts. Businesses often directly contribute money and staff time into developing these community projects. Examples include IBM, which restructured its entire operations to give a major focus to open source. This has proved to be an incredible success, despite the fact that IBM's contributions benefit their own competitors. IBM is now among the most skilled and profitable providers of Linux administration support and deployment consultation services.

Opponents of open source, such as Darl McBride and Mohit Joshi are fighting a bitter war against open source, labelling it communist, viral and damaging to innovation. Not only because it threatens proprietary software, but because it represents a fundamental shift in thinking from "I am the centre of my universe" to "The community is my universe". This way of thinking does not promote rampant consumerist behaviour or unfettered monopolist marketism, and as such is bad for corporate profits.

Other examples of self-regulated community conscience are to be seen in Ray Anderson's efforts with his company Interface, which required a huge leap of market-defying faith before dividends were paid. And paid they were, for Interface is now being rewarded by the market for its initial boldness. Unfortunately, community minded people are still in the tiny minority and generally labelled charlatans or hippies. Men like Ray Anderson are virtually non-existent in the business sector where profits this quarter are all that matter. Visionaries with sights on a better place for humanity, the arrival at which requires sacrifices on the bottom line this fiscal year, are unwelcome and derided as communist.

Pop culture convinces people that the only morality is satisfying the self. There is no reason that markets can't be self regulated by people with conscience. I agree that it is unrealistic, but only at this point in western history, because society has been conditioned by the panem et circenses of McDonalds, reality TV, credit cards and 34 brands of shampoo, all of which are elements of pop culture, acting in concert with the aim of convincing us that the only goal in life is self-gratification and consumption. It's no wonder that nobody thinks about the welfare of others; there isn't a game show that rewards altruism or a tabloid about people like Fred Hollows. Society as a whole has been saturated by depravity, and it is for this reason that the assertion that markets solve the ethical problem by creating a mechanism where consumers can vote with their feet for those market operators who engage in ethical practices is false. Society doesn't know, and has been conditioned not to care, about the ethical transgressions of corporations. Nike is still popular despite exposure of its sweatshops, Pfizer products are still among the most prescribed drugs despite its heinous transgressions in Nigeria and teenagers still take up smoking at record rates.

The answer to the poverty, hunger and war resulting from the gross inequalities between populations is not some socio-economic model of living standards, the issuing of "development loans" or even organized charities like World Vision and Oxfam, laudable though they may be. The answer is the re-discovery of social conscience. The answer is to recognize and resist the destructive elements of modern society such as spirit-crushing beauty magazines and depraved reality game shows. The answer is to re-introduce morality into society, reject the selfish consumerist values proscribed by pop culture and to re-realize that no man is an island. Helping one is helping all and in effect, helping ourselves. Only when we consider social gains to be intrinsically beneficial to ourselves, can we begin to cure social ills.

nVidia chipsets and Linux

9:16am 12th March 2006

I am at a loss as to why nVidia refuses to publish the specifications for its nForce 2200 Pro chipset. I can understand the need to keep the drivers for its graphics cards closed source binary only distributions, but as to its decision to take the same path with its chipsets, I am quite mystified.

Motherboard chipsets, in order to be transparently available to the user, need to be integrated into the operating system. Distributing binaries makes the installation of I/O controllers, RAID cards and other onboard devices a pain for users who just want to have a system up and running as quickly as possible.

Furthermore, nVidia's chipsets are the best chipsets for the AMD64 platform, which is as popular with Linux servers as it is among gamers. If nVidia wants in on this market in a meaningful way, drivers will need to be incorporated into the Linux kernel, something that cannot happen unless usable specs are given to the maintainers of the relevant Linux modules. Jeff Garzik, maintainer of the kernel module libATA, has been quoted as saying that "Unfortunately, Nvidia is the only SATA hardware vendor that chooses not to give me any hardware information".

As it stands, installing Linux onto the new server I am unable to use the nvRAID functionality of the board, which presumably would allow me to use the hot swap bays properly and allow for automatic volume rebuilds in the case of disk failure. Instead, I am using Linux's md system to provide software RAID functionality. It has proven to be a very high performance and reliable solution indeed, but I still feel that using even the partial RAID functionality provided by the nForce 2200 chipset would be preferable.

New mrnaz.com server!

8:27pm 3rd February 2006

Well the new server has arrived. The much anticipated final home of mrnaz.com has been delivered and is waiting for installation. it contains:

I have put pictures of it up in the gallery.I am now installing the OS (Debian Linux) on it, and once all the packages are installed and tested I will schedule data center entry time and then migrate the live data across just before it is due for deployment. Wish me luck!

Ineptitude

6:31am 3rd November 2005
Bloody hell. I have just spent the morning configuring the medical software at my parents' new medical clinic. After setting up MS Windows 2003 Serve with SQL Server 2000, all the workstations with the appropriate permissions and settings and getting everything to what I thought was ready for the install technician to do her job, I found that things were not going to be smooth.

First off, PractiX, the ridiculously unpolished software package for managing medical centers, had requirements over the network environment that were so specific that it is hard to imagine integrating the software into an existing infrastructure that was not specifically deployed with a view to using that exclusively. E.g., it can ONLY use the 192.168.1.* subnet, all users need to be in the workgroup "practix" and they require MS SQL Server's Query Analyser to be installed. What use that can have in a production deployment I don't know.

Secondly, the installer technician had no idea what she was doing. She called me in claiming that the server's "ODBC settings weren't allowing the application to access the SQL server" and that the database files needed to be in the default location on the server. So I drove there to find that in actuality, the problem was that she had not configured PractiX with the required server details. I mean what the HELL?! If I wasn't familiar with the PractiX package, how would that have been fixed? She had no idea, she just mindlessly blamed the installation of Windows and SQL Server. Then, this morning, she didn't know how to set up the printers, which again was a setting in the application she was sent out to install.

I really, really hate inept technicians. She did admit to not knowing much about "the computer side of things". Well then what the hell is she doing working with the computer side of things? As far as I can tell, all she *can* do is put a CD into the drive and click OK. Grr! This is a very frustrated Naz signing out.

Star Wars!

2:43am 19th May 2005

WHOA!

OK, deep breaths, relax. Last night, I saw "Star Wars: Revenge of the Sith". And MAN is it awesome! Given the mediocrity of "The Phantom Menace", and the fact that "Attack of the Clones" was stil not up to the originals, I wasn't expecting to be blown away. But I was. The lightsaber duels were incredible, the character development was well done, even Anakin, played by that Hayden Christensen idiot, was well portrayed. I won't go into too much detail here, as I'm sure many people still haven't seen it, but rest assured, nobody will be disappointed. Except perhaps chicks looking for a soppy love story.

When we got there, as predicted, all the Star Wars fanatics were dressed up. There were Jedi, a few Storm Troopers and even a Wookie, albeit a little shorter than the 8 feet tall they usually are. One kid had a very authentic lightsaber, full sized made out of metal and transparent plastic, and with very, very impressive lighting effects.

I was sitting next to Darth Vader, although instead of being menacing, he was just annoying. I mean really, Darth Vader isn't supposed to have ADD is he? The Dark Lord of the Sith loses a bit of his Dark Side aura when he's hopping up and down on his seat yelling "Hurry up!" to the guy selling the choc tops. Not only that, throughout the movie he was ooing and aahing every time a lightsaber lit up, and at the point in the movie when Darth Vader's helmet was put on and the characteristic breathing started I thought he was going to have an orgasm. If only I could use the force to make him shut the hell up!

All in all it was a great night, the atmosphere at these midnight premiers is unbeatable. Every person in the audience goes quiet when the lights dim and you can feel the ripple that goes through the crowd when 400 spines tingle in unison as the Star Wars theme music thunders out of the speakers after 3 years of waiting in anticipation. 400 people waiting 3 years for that moment. Thats a collective 1.2 millenia of waiting, and you could feel it in the air. It was like that for "Phantom Menace" and "Attack of the Clones" as well. Now that the series is complete, I don't know if I will ever get to experience a moment like that again.