Why is Arabic scoring low on the digital space? | Print |

 

1. Problem introduction

Currently, for multilingual computer users with Arabic being in the basket of languages of interest, and based on over a decade of studies and surveys, the current keyboard layouts and text‐input mechanisms are impractical, unintelligent, and insensitive to the science and culture inherent in the Arabic alphabet when mapping Arabic characters to keyboard keys. The available mechanisms simply make it difficult for the native Arabic user or users of Arabic alphabet in general to use languages based on Arabic characters for daily digital interaction without continuously feeling abnormal with. This article attempts to "partially" address the question from a "technical" view point: Why is Arabic scoring low in the digital space?[1] Observations and rudimentary solutions are also presented and are discussed in some details.

Please note: Put mouse over references (e.g., over here: [1] ) to see associated text as provided at end of article.

 

2. Text‐editing mechanisms

The importance of text‐editing mechanisms by means of an appropriate keyboard layout (KBL) cannot be downplayed nor can be overemphasized. Well‐studied and analyzed methods of text‐editing through intelligently designed KBL's and input mechanisms are a first step to winning the loyalty and confidence of computer users from all sects of a society; they are the entry point to gaining the trust of every single computer user in regardless of their background, profession or age category, especially the young users and their early impressions on what they interact with. Currently available software products and solutions that service the Arabic-derived alphabet languages and their right-to-left (RTL) direction are gaining only bounded popularity because there are simply no credible alternatives available for the consumer market thus far.

3. Arabic on the Internet

As the Internet continues its infiltration to shape the modern knowledge landscape of our world, it is only fair to acknowledge that for the first time in history, natural languages are constantly and competitively being ranked for their presence in cyberspace, and are compared against each other for accuracy, reach, ease of learning and usability. Let’s study Wikipedia.org as one example. Whereas there are over 3,607,000+ documents in English in Wikipedia, there are only 145,000+ documents in Arabic; this translates to only 4% of the number of the English documents, even though:

  • Arabic is the native language of over 350 million people [2]
  • Arabic alphabet is used in the scripting of 14+ languages, indicating a potential interest of over a billion people [3]
  • Arabic is ranked as the fifth spoken language [4]

It is further alarming to note that some languages that are far less ranked than Arabic have over 100,000 Wikipedia articles with a constantly rising contribution growth rate to. For Arabic, the language that possesses the interest of several multiples of the number of its native speakers, the number of articles and the contribution growth rate are far less than the anticipated statistics it is worthy of Wikipedia sources.[5, 6] Figure 1 shows a snapshot of Wikipedia's main page image.



Figure 1: Wikipedia main page snapshot showing top languages based on the number of 
articles (retrieved on April 12, 2011). Although Arabic is the fifth highest spoken language, 
it is ranked 25th in terms of number of articles according to Wikipedia sources. [5, 6]

The above is one example on how unfortunate the Arabic language is in regards to literature contribution on the Internet. A chief factor for such low contribution is the method of input embodied by the available KBL's and associated mechanisms of enabling digital devices to accept, process, and output text in Arabic and in RTL direction. This shortcoming has singlehandedly discouraged millions of Arabic‐alphabet users from using digital devices in their own native language, which in turn explains the low quality and availability of academic literature, software products, solutions, and services that are geared towards the Arabic‐alphabet digital market.

4. Knowledge transfer, or knowledge futile?

Many solutions have been created to remedy the alienation of the main available KBL ALBYSaSh [7] shown in Figure 2 to users of Arabic alphabets. But, a question is put forth: Why? Why are many variants of the available Arabic KBL being created? Why do some variants summon keys far from the reach and convenience of touch-typists' fingers to produce Arabic characters? Why do some variants engage numeric and symbol keys to fill in for the Arabic characters? Why do some variants resort to the use of multiple modes (e.g., Shift, Option…) to producing Arabic characters? These variant layouts with all their good intentions have actually posed numerous obstacles and setbacks.[8, 9]

The culprit.

Figure 2: A typical ALBYSaSh keyboard layout; named so after the arrangement of
the six letters seen through the yellow rectangle.
 

In fairness to all invented variants of Arabic KBL's, they all share one common and attractive principle: It doesn't matter which KBL is used, what matters is that all knowledge acquired during typing in one KBL is transferred to the other languages typists need to type in, thereby flattening the learning curve and increasing typing throughput. Such advantages are enjoyed by most non-English KBL's that are based on QWERTY -- the most widely used KBL [10]. This implies that when a QWERTY user needs to switch between English, French, Italian, Portuguese, Dutch, Danish, Czech, or Estonian to name a few, all a typist need to do is: start typing, thereby maximizing knowledge-transfer. However, when Arabic is thrown in the basket of languages of typing, even though:

  • There is more than 73% of phonetic correlations between Arabic and Latin alphabets. For example, s = س , b =  ب , c = ص , r = ر , f = ف  and so on, [11]
  • There is an extremely high shape/phonetic/shape-and-phonetic relationship among Arabic characters that should be put to work to producing the 48 characters shown in Figure 3,

such relationships have so far gone unexploited. For example, there are only about 16 unique shapes in the Arabic letters while the rest are dependent forms that could be easily remembered from them. In these sample six letter blocks (read RTL, master shape on right):  ا  أ  إ  آ  ء ،  س  ش ، د  ذ ،  ص  ض ، ع  غ ،  ي  ى  ئ  there are only six unique master letters, the other 10 letters are dependent upon them. Even tashkeel characters form their own blocks as in  َ   ً    and   ُ   ٌ   . Complete study on such relationships and effective means of exploitation are available [9,11]. When such relationships are not made use of, the knowledge acquired during typing in Latin-based languages is unfortunately futile, and a fresh learning process of a new KBL to typing in Arabic is deemed necessary!


Figure 3: The 48 characters that should be standard on Arabic keyboard layouts.
Included are two non-standard Arabic-derived letters (پ  and  ڤ) for purposes of
accurate transcription/transliteration from Latin to Arabic, and the standard tashkeel
characters. A problem: How to map 48 characters to 26-letter keyboards, intuitively?

5. Give me Arabic, but in English please!

Another lesion that has distorted the taste of typing in Arabic in many of the modern approaches is the mixing and matching of letters, symbols and numbers, where every printing key, be it alphabetic, numeric or symbolic is candidate to producing many of the 36 Arabic letters. We are disappointed when we learn in history how some nations switched Arabic alphabet for Latin-based ones. But what are native Arabic character users doing when most of their communication takes place in Latin-derived languages? Or at best in non-Arabic characters? And when the need arises to injecting Arabic words or characters witin our Latin text, there are all those numbers, symbols and LTR letters to fill in for them. That's not all, specialized software available now graciously accepts our Latin input and does the flipping to Arabic letters on our behalf. Figure 4 shows an example of some touching Arabic stanzas that are conveyed in Latin letters, numbers and symbols -- the approach typically used nowadays for "chatting"[12]. Aren’t these subliminal messages that Arabic characters with their direction are out of date? The impracticality of available KBL's is one thing, but seeking solutions that do to the taste of typing in Arabic what fast food does to a hungry person is just as damaging. [13]


Figure 4: "Arabish" in action. This poem by Hafith Ibrahim is composed of 23 stanzas
where Arabic is making a cry call to its speakers. It is entitled "The Arabic language is
mourning its fortune". [14] The English in the image is a transcription of some stanzas.

If we don’t start motioning the wheel in reverse direction towards practical and intelligent solutions that are sensitive to the science and culture inherent in the Arabic alphabet, who would? And, what heritage are we leaving to our next generation about Arabic? 

6. A second keyboard layout to learn... a keyboard layout one too many!

Keyboarding may prove to be a tedious learning process [15]. Once time and energy are invested in learning a particular KBL, it becomes very hard to invest time and energy to learn to type in new layouts and bear the accompanying frustrations associated with such learning. We simply use and stick to the KBL we first used to interact, which in most cases is the QWERTY KBL. The main problems stems from the fact that our fingers form a long-lasting association between keys and letters, so 'a' is assigned to the left-pinky on the home row, and 'n' is assigned to the right-index on the bottom row. This is known as muscle memory, where the typist (or better yet, touch typist) for example doesn't think about where 'd' is... the left index spontaneously finds it for them [16]. Figure 5 shows a typical keyboard with color-coding for ideal keys-to-finger mapping. So, what happens when we suddenly need to retrain the muscle memory on the locations of  ن , ا  and د , which map to the aforementioned English letters, respectively? The outcome is usually negative and is time consuming.


Figure 5: The figure shows the correct positions for the fingers of touch-typists to
maximizing typing speed using matching colors. Fingers are trained to type characters
in their own, designated path.

Further more, if we know how to digitally express ourselves in one language we will, even if that language is not our main one. At the same time, we unconsciously let the other languages slip away from our consideration for typing, even if one of which happened to be our main language… a very unfortunate fate due to the lack of knowledge transfer, exploitation and reuse.

7. Towards knowledge transfer, exploitation and reuse -- towards Intellark

All of the above factors were put in the mix for the making of Intellark [89]. Intellark is a new Arabic KBL that’s built to:

  • maximize knowledge transfer among all languages of interest, be them Latin- or Arabic-derived languages
  • respect and exploit the science and scripting methodology inherent in the Arabic alphabet [11]
  • respect frequency distribution of letters of the same block to decide on the accessibility of characters [17]
  • respect the touch typist’s needs in using and reusing only the conveniently located letter keys for typing
  • respect the typist’s eyes and perceptive faculties during typing by abstaining from the use of non-alphabetic letters
  • integrate an intuitive solution to cater for the tashkeel characters when needed, and
  • enable typing in RTL direction, using Arabic characters -- just the way Arabic should be typed

There is one precondition that Intellark requires to be satisfied when using it to type in Arabic: Think in Arabic, as one should, not in Latin [18]. Figure 6 shows Intellark imposed on a QWERTY KBL.


Figure 6: Intellark imposed on a QWERTY keyboard layout. Details about the color coding
used in this figure are available.[8, 11]

8. Conclusion

In conclusion, this article addresses one aspect of why Arabic is scoring low in the digital space from a "technical" view point: the available KBL's. Future work should continue to investigate and highlight all that's needed to normalize the use of Arabic characters in the digital space. Some outstanding issues are mentioned next, but further analysis and solutions are needed.

  • Separation of editing environment orientation from that of text components. For example, if the environment of a facebook account is LTR-oriented, a user should still be able to right-align their text.
  • Appropriate treatment of RTL characters and symbol mix. Try add or edit a reference for an Arabic article on Wikipedia and observe how symbols become garbled with characters, leading to a lot of confusion and discouragement.
  • Prettier font families that are digitally sharp and beautifully anti-aliased
  • More adhering font styles as shown in Figure 7.

Figure 7: Geometrically, italic font style direction used for English as shown in
the top line in the image should be reversed for Arabic. 
Arabic character version
of "italic" should have its vertical strokes pointing north-west, not north-east.
 

For a cultural, social perspective on the subject, the reader is referred to the elaborate work of Mohammad Asfour [19]; his handling of the positive and negative effects of translation on the Arabic language is complementary to the fundamental issues discussed in this article. Other perspectives should and would be enumerated in future work.

 

Why is Arabic scoring low on the digital space to you? Become part of motioning the wheel in the positive direction for Arabic by commenting on this and the other articles. Your comments are highly anticipated and appreciated in the comment box trailing this article. If you have a deeper study on this or other related topics, please consider submitting your contribution for publishing in the Intellaren Articles page.

Please note: This article expresses the views of the author as based on his own research, studies and observations.

9. References

[1] By Digital Space we mean all hardware/software tools and networks that save, process and present information digitally. As a result, handheld devices and cyberspace are among the immediate members of the digital space.

[2] Wikipedia on List of Arab countries by population.

[3] Retrieved in December 2009 from Wikipedia on Arabic alphabet: "The Arabic alphabet was first used to write texts in Arabic, most notably the Quran, the holy book of Islam. With the spread of Islam, it came to be used to write many other languages, even outside of the Semitic family to which Arabic belongs. Examples of non‐Semitic languages written with the Arabic alphabet include Persian, Urdu, Pashto, Baloch, Malay, Balti, Brahui, Panjabi (in Pakistan), Kashmiri, Sindhi (in India and Pakistan), Uyghur (in China), Kazakh (in China), Kyrgyz (in China), Azerbaijani (in Iran), Kurdish (in Iraq and Iran) and the language of the former Ottoman Empire. In order to accommodate the needs of these other languages, new letters and other symbols were added to the original alphabet".

[4] Wikipedia on List of languages by number of native speakers.

[5] Wikipedia on number of articles per language. Arabic is shown with rank 25th (retrieved on April 10, 2011).

[6] Jimmy Wales, creator of Wikipedia on video on Aljazeera English. You may fast-forward to Minute 7:56 of Part II to hear Jimmy talk about Arabic articles contribution on Wikipedia.

[7] Influenced by the way QWERTY is named (see [10] below), the Arabic main keyboard layout is called البيسش (a, l, b, y, s, sh: read ALBYSaSh) in compliance to the arrangement of the middle-center key labels starting with the letter alif and reading leftwards. See Wikipedia on keyboard layouts for more on the subject.

[8] Wikipedia on Intellark presents a quick and light introduction.

[9] Introduction on Intellark: a short video in Arabic or in English, detailing common problems inherent in the design of Arabic keyboard layouts.

[10] Wikipedia on QWERTY keyboard layout. The name QWERTY is simply spelling the arrangement of top-left six letters.

[11] Phonetic correlation between English letters and Arabic characters, and among Arabic characters themselves is addressed in the Intellark Tutorial.

[12] Wikipedia on Arabic chat alphabet.

[13] Fast food… good in the mouth, but how about in the rest of the body? See this experiment of a person who restricted his diet to fast good for just one month. 

[14] Contemplate the poem of Hafith Ibrahim entitled: "The Arabic language is mourning its fortune. .اللغة العربية تنعى حظها". Other translations would replace the word fortune by mischance, destiny or... fate.

[15] Although commercial, this article on "Is touch keyboarding impossible to learn?" provides some insight on the subject.

[16] Wikipedia on touch typing and muscle memory.

[17] Intellaren Articles on  "A Study of Arabic letter frequency analysis". Shorter version is available on Wikipedia on Arabic letter frequency.

[18] Intellark comes with a tutorial that provides nine interactive exercises; among the most important is Exercise 3 entitled: Think in Arabic, not in English!

[19] Wikipedia on Mohammad Asfour. His article on the Effect of translation on the Arabic language ( تأثير الترجمة على اللغة العربية ) should be of high interest to those concerned with the subject matter of this article. The last section entitled: "هل ثمة ما يمكن عمله؟" is simply a must-read.