Jul 4, 09



Arabization - It's Harder than just Right to Left


One of the most exciting announcements at the Global Voices workshop at Harvard Law School a week ago was the new Arabic weblogging tool developed by iUpload and funded by Spirit of America, a non-profit group dedicated to increasing goodwill between Americans and Iraqis through relief work in Iraq. Launching the tool were Mohammed and Omar of Iraq the Model, prominent pro-invasion Iraqi bloggers.

The launch was not without controversy - Spirit of America CEO Jim Hake raised some eyebrows when he declared that the Arabic-language weblogging tool would provide free weblog hosting for "friends of democracy". Pressed for a definition of "friends of democracy, Hake indicated that the site would be willing to host any blog that did not advocate violence or terrorism. As Rebecca Mackinnon noted in her report on the meeting on Personaldemocracy.com: "...many attendees of the Global Voices workshop voiced skepticism at any attempts by an organization to determine who has the right to a free blog and who doesn’t."

Controversy aside, the Spirit of America tool is one of several important efforts designed to increase the production of Arabic language content online and make the internet more accessible to Arabic speakers. The process of adapting tools so they're usable by Arabic speakers - often referred to as "Arabization" - is harder than it looks.

While written Arabic is "lingua franca" of sorts, spoken Arabic varies somewhat from country to country. Egyptians have a hard time understanding Jordanians, who in turn, have a hard time understanding Syrians, and so on. The matter gets even more complicated once you consider the challenges of creating technical terminology in the context of a classical language. Classical Arabic doesn't include a term for "hard drive", for instance - how does the language adapt to allow conversation on these technical topics?

Alaa Abd El Fatah, a brilliant Arabization geek and a member of EGLUG, the Cairo-based Egyptian Linux Users' Group, is helping to translate a series of introductions to open source software being developed by EGLUG partners in the colloquial arabic spoken in Egypt, rather than the classical arabic understood throughout the region. It's more readable and accessible to the Egyptians he's trying to convert to the Open Source cause, but it's hard for people outside the country to understand. Raed Neshiewat, a software developer in Amman, Jordan, mentioned reading an article in a computing journal from Syria and finding it very confusing:

"They were using a term to mean "the case of the computer". The term probably translates into English as "chassis". But in Jordan, we use that term to mean, "the body of the car". So I was trying to figure out why this guy was trying to put a hard drive in the body of his car."

It gets more complicated: Libya and Syria have been resistant to any loan words from English, French or any other European language for any sort of scientific discourse. So they've created their own Arabic-derived terminologies for chemistry, physics and computer science, which also need to be harmonized with usage in other Arabic-speaking nations.

One approach to solving the language problem is to agree on a common source of terms - Raed suggests that PC World, published in Dubai, is becoming the "stylebook" for Arabic technical discussions. Alaa, as an open source geek, is more interested in a grassroots approach - he's a participant in a project called Arab Eyes, which is trying to Arabize large sets of open source programs, and is maintaining a wiki glossary of arabic computing terms, trying to get the F/OSS communities throughout the region to converge on a single set of technical terms.

The process of Arabizing technical terms happens very quickly. Amina Khairy, a reporter for Al-Hayat, recently wrote an article on Egypt's emerging blogging scene, reports that there's already a verb in Egyptian arabic that means "to blog" - "bal'waga". But most of Egypt's bloggers are writing in English, perhaps because many of them are ex-pats living in Cairo, but also possibly because they're looking to reach a global audience. (She mentions that a number of Ethiopian immigrants, working as nannies for wealthy families, are also blogging, in a combination of Arabic another language, probably Amharic.)

A common vocabulary is not the only linguistic problem Arab developers face while localizing software. Alaa points out that many of the current open source geeks are near-completely bilingual (as he is). They often write technical documents in a combination of Arabic and English. While many open source developers are smart enough to realize that Arabic is written right to left instead of left to right, very few are smart enough to smart bidirectional text - text fields that can be left to right or right to left, depending on what text is being entered.

The biggest unsolved problem may be search. Most content management systems that have been localized into Arabic have search functionality that is either deeply compromised or fails entirely. The reason is that Arabic has several diacritic marks that modify alphabetic characters, as well as different forms for a character, depending on where the character appears within a word. But effective searches need to strip diacritics and search for any of the variants of a character, not the specific character/diacritic sequence. MySQL and Postgres are smart enough to do this for European languages... but not for Arabic. So any CMS built on an open database tends to have no, or poor, search support.

The good news: people like Alaa are on the case, reporting bugs, patching software and trying to ensure that everyone in the Arabic speaking world will be able to use critical pieces of open source software. And projects like Spirit of America demonstrate that when bloggers put their money where their mouths are, professional developers are willing to build new tools to help bring Arabic speakers from Yemen to Mauritania online.

Bookmark and Share

Help us change the world - DONATE NOW!

Comments

Fascinating.

Posted by: praktike on December 21, 2004 3:21 PM

I would like to note that there is not much of a problem for using existing bloging services like blogger.com with arabic, as you can see in http://gharbeia.blogspot.com/ which uses both blogger and blogspot. However the person will need to know CSS, to switch the layout direction from right to left. Already blogger.com defaults to unicode which is a good thing.

However, the English interface will not encourage people knowing no English to create an account and post, etc.

In other words, and in addition to your post. It is the interfaces to those services that count. And properly translating them to enable as much arabic speaking bloggers as possible.

Posted by: 304678 on December 22, 2004 3:41 AM

Great post. Thanks, Ethan!

Posted by: Alex Steffen on December 22, 2004 10:49 AM

EMAIL THIS ENTRY TO:



YOUR EMAIL ADDRESS:



MESSAGE (optional):



Our Mission

worldchanging was founded on the idea that real solutions already exist for building the future we want. it's just a matter of grabbing hold and getting moving.

About Worldchanging
Worldchanging Team Members

What else are we up to?
Find Out Now
Feedback

"The most important web site on the planet."

- Bruce Sterling

Speak Up

Have an idea or know about a great new tool or solution? We want to know about it!

Suggest a Story
Submission Guidelines


Contact Us

Editor
Advertising


Credits

Design:
Matt Chapman

Logo Design:
Egg

Hosting, Development, and Technical Management:

Guardian Environment Network