Intellibe: An Arabic to Latin text transcriber | Print |

This article exists in other translations [ Id: ar00005  عربي ], also accessible through the articles page.

👋👋👋 Try Anab 🍇, the Quran words retriever with amazing accuracy and speed. Here is an introductory 6 minute video, here is a detailed 30 minute video, and here is the app.

1. Introduction

Intellibe (Intellaren's Arabic-to-Latin text transcriber) is a pure computational transcriber. It transcribes Arabic input into Latin-based alphabet letters that provide the closest matching phonetic sound when vocalized. For example, when transcribing the name  محمد , the Latin characters that would provide the closest phonetic sound when vocalizing it are those as arranged in "muhammad" or "mohammad", and not "mhmd" or "mohamed" for example.

This version of Intellibe uses English alphabet as the representative of Latin-derived languages. When using Intellibe's user interface (IUI), there are three outputs that are generated for each user input, each being displayed in its own compartment; they are described next.

  • Input: Intellark-enabled editor compartment for typing in Arabic. Intellibe's editor is Intellark-enabled, therefore, any of the letters below can be typed, transcribed and displayed in the three output compartments. You may learn about Intellark or take the one-hour Intellark-layout-tutorial that teaches all about the new and intuitive way to typing in Arabic. You may of course disable or enable Intellark (by typing Alt-L) to type using your preferred Arabic keyboard layout.

    ا أ إ آ ء ب پ ت ة ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ڤ ق ك ل م ن ه و ؤ ي ى ئ

    َ    ً    ُ    ٌ    ِ    ٍ      ْ    ّ

  • Output 1: Plain English compartment. This compartment displays the resulting transcription in pure English letters. Although the result may seem uncommon to an English-reading eye (e.g., transcribing رامي as "raamee" instead of "rami"), it does map to English letters that capture as close as possible the phonetic sound of what's being transcribed.
     
  • Output 2: DIN 31635' compartment. This compartment shows the resulting transcription in DIN 31635'. Students and educators of the Arabic language in particular should find this map of extreme convenience to conveying in English what would otherwise be impossible to approximate vocally. In this compartment, رامي is transcribed to "rāmī". DIN 31635' is explained below.
     
  • Output 3: DIN 31635' in plain English compartment. This compartment scans over the above DIN 31635' output, converting each non-plain-English character into its most-resembling equivalent letter from the available 26 English letters, thereby producing familiar looking spellings to an English-reading eye. For example, "rāmī muḥammad" would be converted into "rami muhammad").

An online and light version of Intellibe for you to experiment with is available.

[ Top ]

2. Applications of Intellibe

Following is a list of applications where Intellibe would prove to be of extreme benefit for the purposes of Arabic-to-Latin transcription.

  • Person name transcription for use in official documents such as passports, credit cards, school and university ID cards...
  • Address, city, channel or company name transcription for accurate and standardized data exchange, logging and retrieval
  • General institution name transcription such as governmental, educational, or commercial places and that includes school and university names, airport and bank names and so on
  • Transcription for educational purposes for students of the Arabic language. Transcription of Arabic characters into English counterparts is no alternative to learning or teaching the vocal sounds of Arabic letters or words. Intellibe can speed up the learning process of Arabic vocalization for students not familiar with the "real" vocal sounds produced in Arabic speech. See this link for an example where transliteration/transcription for educational needs maybe needed. Intellibe may easily and robustly be used to produce strikingly accurate, standardized results with several output flavors.

Section 11 of this document mentions how you may order your own customized copy of Intellibe to cater for your needs.

[ Top ]

3.  Transcription x Transliteration, and the way of Intellibe

Before proceeding in this section, let's define precisely what is meant by transcription and transliteration in order to justify Intellibe's choice of one over the other.

  • Transcription: A representation of speech sounds in phonetic symbols.
     
  • Transliteration: A representation of letters or words in the corresponding characters of another alphabet.

That is, whereas transcription is concerned more with the issue of representing text in one language by a set of characters or symbols in another language to be correctly vocalized, transliteration is concerned more with simply representing the characters of one language by matching characters in a different one. For example, consider the Arabic sentence أكـلـت جـزرة , which means "I ate a carrot" in English. Transliterating the eight-letter sentence gives this eight letter pair: aklt jzrt, which, although accurately maps each Arabic letters to an English counterpart, it is insufficient material for correct vocalization. Transcribing, on the other hand, takes liberty in adding and removing characters as needed to allow an alien to the destination language to vocalize the phrase correctly. Transcribing the Arabic sentence gives akaltu jazarah for example.

Another example. Consider the word الـنـار , which means "the fire". Letter for letter transliterating gives alnar, whereas transcription gives annaar -- the way it is vocalized in Arabic.

Intellibe, Intellaren's Arabic-to-Latin text transcriber, takes the more difficult, yet more useful and appealing approach -- transcription; that is, it takes liberty to add and remove letters as it sees fit to guide to correct vocalization of Arabic text for practical and official purposes.

[ Top ]

4. Transcription shortcomings without a computational parser

To highlight the problem, consider the way transcription of common personal details has been conducted so far, especially for official documents such as passports, certificates, electronic cards or more generally, the filling out of forms that require many details about the applicant. Transcription of client details from Arabic to English is typically accomplished by one of the following three methods:

  • By the applicant themselves, where they are required to fill in their own personal details. This leads to many inconsistencies and variations in transcribing even the simplest person or school name.
  • By the representative agent who may or may not be literate by the intricacies of the foreign language in which the details have to be transcribed into. Here, transcription quality and standard are at the mercy of the processing agent, and numerous varying transcriptions of the same noun or verb are expected.
  • By means of fetching transcriptions from a database repository or user-made dictionaries that are constantly being filled and maintained by organization personnel. Transcription here lacks standardization from one organization to the other, and quality and standard of transcription are still subject to user input as mentioned in the previous point.

Therefore, all of the aforementioned means of transcription lack a unifying, non-subjective transcription process. Some real and typical recurring examples inconsistent trascription follow.

Given these names: سفيان , رامي , عبد العزيز , يوسف , محسن , عبد الستار , عبد القادر , محمد , consider the following versions available for each single Arabian name. You may verify the presence of each name variant shown in the table below by searching for it using any search service provider such as google or intellaren.

  Name   Possible transcriptions/transliterations used
  محمد   mohamad, mohamed, mohammad, mohammed, muhamad, muhamed, muhammad, muhammed...
  عبد القادر   abdulkader, abdelkader, abdelkadir, abdul qadir, abd alkader, abd-ulkadir, 'abd'alkader...
  عبد الستار   abdulsattar, abd alsattar, abdussattar, abdelsatar, abdalsattaar...
  محسن   mohsen, mohsin, muhsin, muhsen, moohsen, mouhsen...
  يوسف   yoosuf, yousuf, yossef, youssif, yusuf, yusaf, yoosef, youssuf...
  عبد العزيز   abdelaziz, abdalaziz, abduaziz, abdol-aziz, abdol aziz, abdulazeez, abdulaziez, abdul'zeez, abd-alazeiz...
  رامي   rami, ramy, raami, raamy, ramey, ramee...
  سفيان   sofian, sofyan, sofyaan, soufiane, soufian...

In fact, and especially where:

  • names are prefixed with عبد
  • names are prefixed with the article ال , or more generally
  • names are muḍāf and muḍāf-ʾilayh ( المُضاف والمُضاف إلَيه , i.e., possessee and possessor, as in Adam's orange, or Kelowna's river), and may be transcribed as a single compound word in English (e.g., أبو بكر  or  شيخ الأرض )

the number of transcription results grows quite large. There are other problems where non-standardized transcription/transliteration leads to ambiguous outputs; namely:

  • The problem of transcribing long vowels as short vowels in English for example, thereby rendering nouns such as ملك , ملاك and مالَك to be all transcribed into one English string: malak for example.
  • The case of whether to transcribe the kasra tashkeel character (diacritic) into an e or an i in English, as portrayed by the names خالِد and ناهِد (e.g., khaled x khalid, and nahed x nahid).
  • The process of data retrieval is not straight forward as is intended to be. For example, whereas searching for employees with the first name عبد العريز in an Arabic populated database table is straight forward, searching for the same name when transcribed/transliterated in an English- or French-populated database table surely requires a lot of searches or some string preprocessing operations to hit a range of possibilities (e.g., look for all names who contain the letters "abdzz" within them and in order), thereby complicating a process that should have been simple and effective.

The following examples illustrate the aforementioned problems more clearly.

 

Name

  Possible transcriptions/transliterations used
  ملك   malak, malek, malik, malak...
  ملاك   malak, malaak...
  الجمل   algamal, aljamal, al gamal, al-jamal...
  جمال   gamal, jamal, gamaal, jamaal...
  عبد المالك   ... malek, malik, maalik, maalek, maleek...
  عبد الملك   ... malek, malik...
  خالد   khaled, khalid, khaalid, kaled...
  ناهد   nahed, nahid, naahed, naheed...
  إيناس   enas, eanas, inas, einas, enaas, einaas, eanas, eanaas, eenas...

The above table examples are by no means conclusive of the problem at hand when engaging into transcription tasks for official or non-official purposes.

In light of the aforementioned mechanisms and disadvantages associated with them, it is the objective of Intellibe's to standardize the process of transcription of Arabic text into Latin-descent languages accurately. Intellibe introduces solutions to all of the above problems as is described below. This current Intellibe version produces English-flavored transcription (as opposed to French or Spanish for example).

We stress again that Intellibe is a pure computational transcriber, meaning that its real-time produce solely depends on user input of Arabic letters and tashkeel characters (diacritics), character by character, to generate its output spontaneously. There is no associated database, dictionary or a word-repository associated with Intellibe except for some pre-programmed rules that are described later in this document. You may realize the validity of our statement by simply typing some Arabic input with tashkeel now inside the input compartment of IUI.

[ Top ]

5. Towards an optimal solution: tashkeel (diacritics) to the rescue

To the Arabic speaker, dialects aside, there is typically one way to pronouncing any of the names in Arabic mentioned earlier. For the non-Arabic speaker, ambiguities in pronouncing those names or Arabic text in general correctly are expected since there are no explicit tashkeel characters present to guide how text should be vocalized. To convey the correct way of vocalizing (and as a result, transcribing) the correct phonetic sound to non-Arabic speakers or to students of the Arabic language, tashkeel must be utilized. Diacritics in Arabic essentially act as short vowels do in English. We now rewrite the above list of names, only this time with both the diacritics and the most probable transcription to English according to this initial set of rules.

  Tashkeel   Shape   Transcribed into...
  fatḥa   َ    a
  ḍamma   ُ    u or o
  kasra   ِ    i (e is another weaker possibility)
  sukūn   ْ    not transcribed into any vowel sound (silent sound)
  šadda   ّ    geminates (i.e., duplicates) the letter it sits on
  Vowel ʾalif   ا   aa
  Vowel wāw   و   ou or oo
  Vowel yāʾ   ي   ee

Applying the above set of rules results in the following transcriptions. Extra rules are mentioned locally next to the transcribed name in this table.

 

Name

  Accurate transcription
  مُحَمَّد   muhammad
  عَبدُ القادِر   abdulqaadir (notice the use of the lām-qamariyya in this name)
  عَبدُ السَتار   abdussattaar (notice the use of the lām-šamsiyya in this name)
  مُحسِن   mohsin
  يوسُف   yousof
  عَبدُ العَزيز   abdulazeez
  رامي   raamee
  سُفيان   sofyaan
  مَلَك   malak
  مَلاك   malaak
  الجَمَل   aljamal
  جَمال   jamaal
  عَبدُ المالِك   abdulmaalik
  عَبدُ الملِك   abdulmalik
  خالِد   khaalid
  ناهِد   naahid
  إيناس   eanaas (and not einaas or inas for example, so that the initial vocal sound is as found in each, eagle, ear, easy or eat; more on this particular transcription below)

All the above results are generated using Intellibe for the plain-English compartment as mentioned in Seciton 1 above. Therefore, with just a simple set of constraining rules, many factors that lead to multiple transcriptions are eliminated. We hope you understand the difference between lām-qamariyya and lām-šamsiyya; if not, let us know and we would publish a brief educational tutorial on them.

[ Top ]

6. DIN 31635 and DIN 31635': the quest for more vocalization accuracy

Intellibe will do its best in generating for the user the closest string of characters in English to match the sound of Arabic input. Unfortunately, not all letters in Arabic have a one-to-one mapping to letters in English, which means that many Arabic characters will have to be approximated by the closest sound generating character or combination of characters available from the characters of the transcribed-to language. To highlight the problem, we show how transcribing words with the letters in the following table leads to many versions of the same word.

  Letter   Potential transcriptions/transliterations
  ث   th (as in thank and thrill, not as in then or those)
  ح   h (as a result, the names حامد and هامد are both transcribed to haamid)
  ذ   th (as in then or those, not as in thank or thrill)
  ص   s (or c when followed by an e, an i or a y), as a result, the verbs سار and صار are both transcribed to saara
  ض   d or dh (as a result, the verbs دلّ and ضلّ  would both be transcribed to dalla)
  ط   t (as a result, the names طالوت , طارق and تارك are transcribed as taaloot and taariq, and taarik)
  ظ   dh, z, zh or th (as a result, numerous versions of names containing ظ are produced)
  ع   a, e, i, o, u, depending on the tashkeel used with it, but the inaccuracy in approximating the phonetic sound remains

To remedy such incapacities inherent in the transcription process, Intellibe resorts to DIN 31635, a standard for the transliteration of the Arabic alphabet that was adopted in 1982. Students and educators of the Arabic language in particular should find this map of extreme convenience to conveying in English what would otherwise be impossible to vocally simulate for correct pronunciation. For convenience, the DIN 31635 map is is reproduced below in Table 1.

 

Table 1: DIN 315635 map
ه
Letters
ī\y
ū\w
h
n
m
l
k
q
f
ġ
ʿ
š
s
z
r
d
ǧ
t
b

ʾ/ā

DIN 31635

To maximize knowledge transfer and usage of the 26 English letters, the following set of rules and adjustments are added and adopted to complement DIN 31635 for the purposes of transcribing using Intellibe.

  Letter   Transcription rules and adjustments
  پ-p   Letter is added for more accurate transliteration purposes
  ڤ-v   Letter is added for more accurate transliteration purposes
  ج   Letter is transcribed into j instead of ǧ to maximize reuse of plain English letters
  خ   Letter is transcribed into ḳ instead of ḫ since k is the first letter in the typical representing digraph kh
  ع   Letter is transcribed into the symbol ʿ followed by a, o or i, depending on whether the diacritic used to guide its pronunciation is a fatḥa, ḍamma or kasra, respectively

 

The results of the above alterations are summed up and are incorporated into a slightly variant standard and is exhibited in Table 2, henceforth called DIN 31635'.

 

Table 2: DIN 315635' map
ه
ڤ
پ
Letters
ī\y
ū\w
h
n
m
l
k
q
v
f
ġ
ʿ
š
s
z
r
d
j
t
p
b

ʾ\ā\a

DIN 31635'

 

We highly recommend the adoption of this standard for transcription or transliteration purposes especially for educational purposes; some advantages are enumerated next.

  • Transcribed words with geminated letters is clearer to read and produces smaller size strings. Consider for example plain transcription of names like الظاهر or الشافي that are prefixed with the article ال where lām is silent (لام شمسية). In plain English alphabet, these names would be transcribed to abdudhdhaahir and abdushshaafee, whereas by using the DIN standard, the transcription would result into ʿabduḓḓāhir and ʿabduššāfī.
     
  • Long vowels will not need more than one letter to transcribe to, hence smaller size strings. For example, whereas transcribing قَيّوم , بَصير and وَهّاب in plain English results in qayyoum, baseer and wahhaab, using the DIN standard gives baṣīr, qayyūm, and wahhāb.
     
  • Transcription of the letters ث, ذ, ز and ظ , or the letters ت and ط is easily distinguishable since each has its own representing letter.

[ Top ]

7. Rules and examples of how to use Intellibe

Intellibe does it best in generating in English a string that closely resembles the Arabic input when vocalized, as based on the input letters and diacritics. Numerous examples are provided in Table 3 on user input and Intellibe outputs. In the table, the following applies.

  The character/string to be transcribed   Symbols   Rule or action taken...
  fatḥa (فَتحة )   َ    Transcribed into the owel a.
  ḍamma (ضَمّة )   ُ    Transcribed into the vowel u (vowel o is an alternative provided by IUI)
  kasra (كَسرة )   ِ    Transcribed into the vowel i (vowel e is not provided as an alternative; see justification in Section 9)
  sukūn (سُكون )   ْ    Not transcribed into any vowel, it may therefore be omitted without compromising the output as in عـبـدُ instead of عـبْـدُ . It should be used only to override defaults embedded in Intellibe or in Arabic in general. For example, whereas قَوي transcribes into qawī, قَويْ generates qawy
  šadda (شَدّة )   ّ    Geminates (i.e., duplicates) the letter it sits on. Intellibe will insert a fatḥa for you over the šadda , but this insertion maybe explicitly overridden as explained in the point above with unsolicited tashkeel insertions
  Tashkeel (diacritics) in general       Generally, a minimum amount of tashkeel is needed. For example, there is no need to put a kasra before the long-vowel yāʾ, or a fatḥa before a tāʾ marbūṭa, Intellibe will fill those in for you
  lām-qamariyya and
lām-šamsiyya
  ال   There is no need to explicitly provide tashkeel characters on letters preceded by lām-qamariyya or lām-šamsiyya during transcription ( اللام القَمَرِيّة واللام الشَمسِيّة ), Intellibe already knows how to accurately transcribe words prefixed with the article ʾalif-lām.
  The dot   .   Inserting a dot (without space) between two words turns them into muḍāf and muḍāf-ʾilayh ( المُضاف والمُضاف إلَيه , similar to possessee and possessor, respectively) and produces one combining string. If the muḍāf ends with no diacritic, Intellibe will fill in a ḍamma on your behalf (thereby exploiting the rules of Arabic grammar); if you don't need the inserted ḍamma, you may explicitly put a sukūn or a fatḥa  or a kasra instead
  Names starting with the common prefix عبد       For names starting with the common prefix عبد , you may either insert a dot as in عبد.القادِر , or simply concatenate the compound words with no space in-between as in عبدالقادِر , both inputs will generate the desired result ʿabdulqādir
  The hyphen   -   Inserting a hyphen (without surrounding spaces) between words hyphenates them, yet while maintaining lām-qamariyya and lām-šamsiyya rules of transcription should the muḍāf-ʾilayh begin with an ʾalif-lām. For example, the name نور-الـهُـدى is transcribed as nur-alhuda; that said, a better transcription would be nurulhuda, which may be obtained by writing in Arabic the name نور.الـهُـدى (see the previous point for how the dot is interpreted)
  The substring تش       The presence of the substring تش is transcribed into ch. For example, the word تشيلّي is transcribed to chīllī instead of  tshīllī
  The substring كس       The presence of the substring كس is transcribed into x. For example, the word مَكسيكي is transcribed to maxīkī  instead of maksīkī
  Punctuation symbols       Punctuation symbols are correctly displayed and even direction-flipped when applicable
  Articles of reference or conjunction       Some special articles are dealt with as expected, so it is sufficient to write بِالقَرية,  التي,  الذي or فالوَلَد to produce the desired transcription

Table 3 below provides numerous examples of input and output. In the table, the following five preferences are assumed (these preferences can be set through the selection buttons available on Intellibe's user interface (IUI)).

  Preference on IUI   Option set in the examples
  ḍamma   Transcribed into u (as opposed to o)
  wāw-madd   Transcribed into ou (as opposed to oo)
  tāʾ-marbūṭa   When without tashkeel, it is ignored (the other option is to transcribe it into an h)
  fatḥa   A fatḥa is put on first ḥarf (letter) by default, so typing برد and بَرد renders the same results
  Apostrophes   Apostrophes are kept by default, so inputting لؤلؤ gives lu'lu'; removing the apostrophe gives lulu
 
 
Table 3:Examples of user input and Intellibe outputs.
DIN 31635' in Plain English
DIN 31635'
Plain English
Arabic input
#
assalamu alaykum warahmatu Allahi wabarakatuh, ahlan wamarhaban bikum; dauna nabda' assalāmu ʿalaykum waraḥmatu Allahi wabarakātuh, ʾahlan wamarḥaban bikum; daʿūnā nabdaʾ assalaamu 'alaykum warahmatu Allahi wabarakaatuh, 'ahlan wamarhaban bikum; da'ounaa nabda'
السَلامُ علَيكُم ورَحمَةُ اللهِ وبَرَكاتُه، أهلاً ومَرحَباً بِكُم؛ دعونا نبدَأ
1
muhammad mhmud hmdi ahmad ahmad muḥammad mḥmūd ḥmdī ʾaḥmad aḥmad muhammad mhmoud hmdee 'ahmad ahmad
مُحَمَّد محمود حمدي أحمَد احمَد
2
yusuf rami hani shawqi bani bani yūsuf rāmī hānī šawqī banī bānī yousuf raamee haanee shawqee banee baanee
يُوسُف رامي هاني شوقي بني باني
3
nurulhuda abdulqadir abdussattar nūrulhudā ʿabdulqādir ʿabdussattār nourulhudaa 'abdulqaadir 'abdussattaar
نور.الهُدى عبد.القادِر عبدالسَتّار
4
nuha shaykh-ala'rd abubakr abu-bakr abulqasim nuhā šayḳ-alaʾrḍ abūbakr abū-bakr abūlqāsim nuhaa shaykh-ala'rd aboubakr abou-bakr aboulqaasim
نُهى شيخ-الأرض ابوبَكر ابو-بَكر ابو.القاسِم
5
Allahu la ilaha illa huwa alhayyu alqayyumu la ta'khuzhuhu sinatun wala nawmun lahu ma fi assamawati wama fi al'ardi Allahu lā ʾilāha ʾillā huwa alḥayyu alqayyūmu lā taʾḳuḏuhu sinatun walā nawmun lahu mā fī assamāwāti wamā fī alʾarḍi Allahu laa 'ilaaha 'illaa huwa alhayyu alqayyoumu laa ta'khuzhuhu sinatun walaa nawmun lahu maa fee assamaawaati wamaa fee al'ardi
اللَّهُ لَا إِلَٰهَ إِلَّا هُوَ الْحَيُّ الْقَيُّومُ لَا تَأْخُذُهُ سِنَةٌ وَلَا نَوْمٌ لَهُ مَا فِي السَّمَاوَاتِ وَمَا فِي الْأَرْضِ
6
- walasri - inna al'insana lafi khusrin - illa allazhina amanu waamilu assalihati watawasaw bilhaqqi watawasaw bissabri - - wālʿaṣri - ʾinna alʾinsāna lafī ḳusrin - ʾillā allaḏīna ʾāmanū waʿamilū aṣṣāliḥāti watawāṣaw bilḥaqqi watawāṣaw biṣṣabri - - waal'asri - 'inna al'insaana lafee khusrin - 'illaa allazheena 'aamanou wa'amilou assaalihaati watawaasaw bilhaqqi watawaasaw bissabri -/td>
- وَالْعَصْرِ - إِنَّ الْإِنْسَانَ لَفِي خُسْرٍ - إِلَّا الَّذِينَ آمَنُوا وَعَمِلُوا الصَّالِحَاتِ وَتَوَاصَوْا بِالْحَقِّ وَتَوَاصَوْا بِالصَّبْرِ -
7
malak malak mullak mulk milk mullika mulika malak malāk mullāk mulk milk mullika mulika malak malaak mullaak mulk milk mullika mulika
ملَك ملاك مُلّاك مُلك مِلك مُلِّكَ مُلِكَ
8
al'iskandariyyatu madinatun misriyyatun taqau ala albahri al'abyadi almutawassit, alaysa kazhalik? alʾiskandariyyatu madīnatun miṣriyyatun taqaʿu ʿalā albaḥri alʾabyaḍi almutawassiṭ, ʾalaysa kaḏalik? al'iskandariyyatu madeenatun misriyyatun taqa'u 'alaa albahri al'abyadi almutawassit, 'alaysa kazhalik?
الإسكَندَرِيّةُ مدينةٌ مِصرِيّةٌ تقَعُ على البحرِ الأبيَضِ المُتَوَسِّط، ألَيسَ كذَلِك؟
9
chilli min akalati almaxiki alharra chīllī min ʾakalāti almaxīki alḥārra cheellee min 'akalaati almaxeeki alhaarra
تْشيلّي مِن أكَلاتِ المكسيكِ الحارّة
10
samir samir samar samar simsim simsim samasim simsimiyya sāmir samīr samar samār simsim simsim samāsim simsimiyya saamir sameer samar samaar simsim simsim samaasim simsimiyya
سامِر سمير سمَر سمار سِمْسِمْ سِمسِم سماسِم سِمسِمِيّة
11
eanas anas eaman amin ayman aysar mu'min alkhurasani eanās ʾanas eamān ʾamīn ʾayman ʾaysar muʾmin alḳurāsānī eanaas 'anas eamaan 'ameen 'ayman 'aysar mu'min alkhuraasaanee
إيناس أنَس إيمان أمين أيمَن أيسَر مُؤمِن الخُراساني
12
abdullah abdulaziz abdulquddus abdushshafi abdulqadir abdurrahman abdurrahim abd-assalam abdulmuhaymin ʿabdullah ʿabdulʿazīz ʿabdulquddūs ʿabduššāfī ʿabdulqādir ʿabdurraḥmān ʿabdurraḥīm ʿabd-assalām ʿabdulmuhaymin 'abdullah 'abdul'azeez 'abdulquddous 'abdushshaafee 'abdulqaadir 'abdurrahmaan 'abdurraheem 'abd-assalaam 'abdulmuhaymin
عبد.الله عبدالعَزيز عبد.القُدّوس عبد.الشافي عبد.القادِر عبد.الرحمٰن عبد.الرحيم عبد-السلام عبد.المُهَيمِن
13
muhammadun abnu abdullah attanjiyyi, orifa bibni battuta, min murrakish, rahhalatun wamu'arrikhun waqadi wafaqihun maghribiyy muḥammadun abnu ʿabdullah aṭṭanjiyyi, ʿurifa bibni baṭṭūṭa, min murrākiš, raḥḥālatun wamuʾarriḳun waqāḍī wafaqīhun maġribiyy muhammadun abnu 'abdullah attanjiyyi, 'urifa bibni battouta, min murraakish, rahhaalatun wamu'arrikhun waqaadee wafaqeehun maghribiyy
مُحَمَّدٌ ابنُ عبد.الله الطنجِيِّ، عُرِفَ بِابنِ بطّوطة، مِن مرّاكِش، رحّالةٌ ومُؤَرِّخٌ وقاضي وفَقيهٌ مغرِبِيّ
14
watara almala'ikata haffina min hawli alarshi yusabbihuna bihamdi rabbihim waqudiya baynahum bilhaqqi waqila alhamdu lillahi rabbi alalamina watarā almalāʾikata ḥāffīna min ḥawli alʿarši yusabbiḥūna biḥamdi rabbihim waquḍiya baynahum bilḥaqqi waqīla alḥamdu lillahi rabbi alʿālamīna wataraa almalaa'ikata haaffeena min hawli al'arshi yusabbihouna bihamdi rabbihim waqudiya baynahum bilhaqqi waqeela alhamdu lillahi rabbi al'aalameena
وَتَرَى الْمَلَائِكَةَ حَافِّينَ مِنْ حَوْلِ الْعَرْشِ يُسَبِّحُونَ بِحَمْدِ رَبِّهِمْ وَقُضِيَ بَيْنَهُمْ بِالْحَقِّ وَقِيلَ الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ
15

[ Top ]

8. Evidence of the accuracy of Intellibe

By stating that Intellibe is a "computational transcriber", we mean that Intellibe does not depend on any existing dictionaries or databases that are pre-filled or strings that are preprocessed; rather, Intellibe parses and transcribes its input as a function of the letters and the diacritics that are used to write text in Arabic; in fact, the use of diacritics (taškīl, or ḥarakāt) is essential and would prove to be an integral part to a meaningful and accurate transcription. When an Arabic string is fed to the Intellibe parser, the Arabic letters are mapped to English letters that are sound-equivalent , while encountered diacritics are transformed into the appropriate combination of English vowels to guide to correct pronunciation (see the Intellark tutorial page for a comprehensive study of the equivalency map details).

The following two examples show Internet search results for a couple of common Arabian names, and how Intellibe, depending solely on its computational algorithms, arrives to the mostly used transcription of such names. Search and data collection were conducted on December 20, 2010 using google.com. Accurate statistics about the number of occurrences of a search string are obtained by double quoting the search string, as in "rami" or "yousuf" for example.

 

Example 1: Transcribing the name رامي

Using Intellibe, the name رامي is first transcribed in Output 1 of the IUI (the plain English compartment) as "raamee", which when pronounced would sound the closest to the way it's vocalized in Arabic; yet, it is not a favored way for a رامي to dictate their name in English. In DIN 31635' in Output 2, رامي is transcribed to rāmī which although i) also a perfect match to the phonetic sound of رامي in Arabic, and ii) appropriate for educational purposes, there is a need to convert the non pure English letters into English for use in official documents or digital data exchange. Finally, the third phases in Intellibe scans over rāmī character by character and replaces non-English letters by their closest match from the 26 English alphabet letters; that turns rāmī  into rami, which results into the most common way when transcribing رامي  to English. Table 4 below gives many existing variants of a رامي transcription into English, together with statistics about the number of hits of each variant that results from a simple google search.

 

Table 4: Statistics on the different variants رامي is transcribed to English.
      #-of-matches-based sort   name-based sort
      رامي # of matches   رامي # of matches
  1   rami 12,200,000   raame 102,000
  2   ramy 7,130,000   raamee 56,500
  3   ramee 555,000   raamey 7,260
  4   raami 192,000   raami 192,000
  5   raame 102,000   raamy 15,500
  6   raamee 56,500   ramee 555,000
  7   ramiy 20,800   rami 12,200,000
  8   raamy 15,500   ramiy 20,800
  9   raamey 7,260   ramy 7,130,000

 

Example 2: Transcribing the name يوسف

Feeding يوسُف to Intellibe gives the following results: i) "yousuf" for Output Area 1, ii) "yūsuf" in DIN 31635' in Output Area 2, and iii) "yusuf" when transforming the DIN output into plain English in Output Area 3. Table 5 shows the statistics collected from a google search on the different variants of  يوسُف.

 

Table 5: Statistics of the different ways يوسُف is transcribed into English.
      #-of-matches-based sort   name-based sort
      يوسُف # of matches   يوسُف # of matches
  1   yusuf 34,800,000   yoosef 97,200
  2   yosef 9,870,000   yoosof 38,600
  3   yousef 7,880,000   yoosuf 246,000
  4   yousuf 4,310,000   yosef 9,870,000
  5   yusif 3,680,000   yosif 493,000
  6   youcef 1,590,000   yosof 23,900
  7   yousif 493,000   yossef 369,000
  8   yosif 493,000   yossif 179,000
  9   yussuf 450,000   yossuf 32,900
  10   yussof 400,000   yosuf 74,400
  11   yossef 369,000   youcef 1,590,000
  12   yoosuf 246,000   yousef 7,880,000
  13   yussef 215,000   yousif 493,000
  14   yossif 179,000   youssaf 7,730
  15   yoosef 97,200   yousuf 4,310,000
  16   yosuf 74,400   yucef 17,900
  17   yussif 72,100   yusif 3,680,000
  18   yoosof 38,600   yussef 215,000
  19   yossuf 32,900   yussif 72,100
  20   yosof 23,900   yussof 400,000
  21   yucef 17,900   yussuf 450,000
  22   youssaf 7,730   yusuf 34,800,000

 

Statistics shown in the above two representative examples prove that:

  • The majority of people usually go for the "correct" way their name ought to be represented in English
  • People like their names to be assigned the shortest number of characters that represent their name
  • People like their names represented by what may seem common spellings, so the substring aa, although phonetically more accurate for transcribing an Arabic ʾalif-madd ( ألِف-مَدّ ), it is not usually favored
  • For each single name, there are numerous variants that should have been mapped to a single representative name

Intellibe, although a computational transcriber (i.e., not pre-loaded with databases of transcribed words), shows that it does have the capability of producing very pleasant and accurate transcription results while working under the aforementioned constraints; see the first row of each table to make sure of this.

[ Top ]

9. Common transcription inaccuracies that Intellibe avoids

This section presents common Arabic-to-English transcription inaccuracies, and how Intellibe is programmed to step out of them efficiently.

There are numerous transcription inaccuracies that although may seem appropriate at the time of transcription, they lead to Arabic names or text in general being inaccurately pronounced when a corresponding English string is vocalized. An effort to list these common inaccuracies follows.

  • Inaccuracy 1. Transcription of ʾalif-madd into a short vowel like that produced by the fatḥa. For example, the verbs كَتَبَ  and كاتَبَ , and the names مَلَك and مَلاك are typically transcribed into the same word, resulting in kataba for the first pair, and malak for the second pair. An accurate transcription adds an extra "a" where a long vowel is used, which would result in kaataba and malaak where appropriate. Intellibe would produce accurate results in the Plain-English and the DIN 31635' compartments, but will get rid of the extra "a" in the third compartment (DIN in Plain English) in favor of producing common-looking English strings.
     
  • Inaccuracy 2. Transcription of kasra to "e" instead of "i". In words that rhyme with the word فاعِل in Arabic ( fāʿil ) as in سالم , صابر , والد and تامر , the kasra diacritic is typically transcribed into the vowel e, resulting into to following transcriptions: salem, saber, waled and tamer. But when these names are pronounced, they usually rhyme with words such as taker, baker or paled. A more accurate transcription where turns the kasra into an i, resulting into the following more corresponding transcriptions: salim, sabir, walid and tamir.
     
  • Inaccuracy 3. Transcription of Arabic words that begin with إي with the letters e, i, or ei. For example, names like إيناس and إيمان are typically transcribed into enas / eman, inas / iman, einas / eiman; however, prefixing with e, i, or ei usually leads to the names being pronounced as in end, echo, eddy, edit, else, epic, ever, enact, or as in ibis, icon, idea, idle, idol, iron, or as in eigenvector, eight, Einstein and either. To accurately capture the vocal sound of such names with the mentioned Arabic prefix, empirical observations steer us into prefixing "ea", thereby resulting in the more intended initial vocalization as found in the names of this list: each, eager, eagle, ear, ease, east, eat and eave.
     
  • Inaccuracy 4. Have you faced other inaccuracies that should be mentioned and avoided? Please provide your invaluable comments below, or contact us regarding your concerns. We will be happy to review them and include them here to enhance this quality of this document.

[ Top ]

10. Common transcription variations that Intellibe standardizes

It is clear and intuitive that there are many variations that transpire when transcribing any single name. The intention here is to begin offering some standards that if adopted, unity and optimality in transcribing Arabian names and data (or text in general) into English is a step closer to accomplishment. In what follows an effort to list such standards.

  • Transcription of  عبد-prefixed (abd-) names. Among the most common Arabian masculine names are those prefixed with عبد , or written as عَبدُ when decorated with tashkeel characters (diacritics), however, there are numerous variations that exist for each عبد-prefixed name, which add confusion when vocalized or treated by non-Arabic speakers. For example, the following prefixing for any transcribed عبد-prefixed name are so common:

              abd, abdo, abdu, abda, abde, abdi, abdl, abdol,
              abdool, abdul, abdoul, abdal, abdel, abdil, abd-,
              abdu-, abdul-, abd-al-, abd al a, 'abd 'u, 'abd'al...

    In fact, the number of possibilities for any of the عبد-prefixed names may easily be in the order of tens upon tens of variants. The following context free grammar expression (CFG) generates but a subset of all the possible variants that exist if searched for in the Internet, and therefore the expression and the numerous variants it produces stand as evidence on the lack of a constraining standard that is required during transcribing Arabic to Latin-derived language at the very least.
     

    عبد-prefixed names   =   [S1]  ( abd  [ <S2> ] )   [ <E>  <S2> ]   ( <N99> )
    S1   =   '  |  λ
    S2   =   ⃞  |  -  |  '  |  λ
    E   =   <V>1,2  [  L  ]
    V   =   a  |  e  |  i  |  o  |  u
    L   =   l  |  <LShL>  |  <LShL2> , (note that the first option is the Letter l, not the number 1)
    LShL   =   any of the lām-šamsiyya letters:  t  |  th  |  d  |  dh  |  r  |  z  |  s  |  sh  |  n  
            (in Arabic, these are the letters ت  ث  د  ذ  ر  ز  س  ش  ص  ض  ط  ظ  ن )
    LShL2   =   any of the LShL's geminated with a possible separator in-between:
            t[S3]t  |  th[S3]th  |  d[S3]d  |  dh[S3]dh  |  r[S3]r  |  z[S3]z  |  s[S3]s  |  sh[S3]sh  |  n[S3]n
    S3   =    ⃞  |  -  |  λ
    N99   =   Any of the 99 names of Allah stripped of the prefix عبد ال

     

    A legend for the above definitions follows.

  Symbol   Description
  E   An expression that generates constrained results
  N99   Any of the 99 names of Allah stripped of the prefix عبد ال
  S   A special character that evaluates to a space ( ⃞ ), a hyphen ( - ), an apostrophe ( ' ), or lambda, denoting an empty character ( λ )
  V   A vowel letter
  ()   Contained sub-expression evaluation is required
  []   Contained sub-expression evaluation is optional
  |   This vertical bar separates among the options to be chosen from (for example, V above must evaluate to one of the values it describes)
  <...>   Sub-expressions enclosed in these symbols are to be evaluated to concrete values
  1,2   The exponent 1,2 implies that the base expression is selected once or twice

 

  • Possessor and possessee names. Although names made out of a possessor-possessee pair (muḍāf and muḍāf-ʾilayh) are understood in Arabic to refer to a single named person in a hierarch of names (e.g., there are three referenced persons in the first, middle and last name of a person), they are usually subject to being misplaced in the name hierarchy when transcribed into English, thereby incorrectly interpreting a person's full name in official documents for example. Some examples are highlighted next.

    • The following names should be treated as a single unit: أبو بكر , أبو ظاهر , أبو القاسم , حسيب الله , yet, if transcribed into English as two isolated strings separated by space each as in ʾabū bakr, ʾabū ḓāhir, ʾabū alqāsim, and ḥasīb Allah, they lead to incorrect interpretation the second part of each pair as if it was a middle name, a father's name or as a last name. Using Intellibe for standardization purposes, a dot would be inserted (as in أبو.بكر, see Section 7) between every pair to result in a single name, they way each name in this class should be transcribed. In DIN 31635', these names would transcribe into ʾabūbakr, ʾabūḓāhir, ʾabūlqāsim, and ḥasībullah. In DIN 31635' in plain English, the names should and would transcribe into: abubakr, abudhahir, abulqasim, and hasibullah.
       
    • Like the names in the previous point, the names mentioned in this point do refer to a single name, yet they would be too long if transcribed into a single name, but also should not be left with an intermediate space to prevent a two-name interpretation as mentioned in the previous point. Consider these: شَيخُ الأَرض , فاطِمةُ الزَهراء , زَهرةُ العُلا. Feeding Intellibe with these names gives the following: šayḳu alʾarḍ, fāṭimatu azzahrāʾ, and zahratu alʿulā.
       
    • Inserting a dot between the elements of each pair gives: šayḳulaʾarḍ, fāṭimatuzzahrāʾ, and zahratulʿulā.
       
    • Inserting a hyphen in-between gives: šayḳu-alʾarḍ , fāṭimatu-azzahrāʾ, zahratu-alʿulā.
       
    • Any of the latter two solutions remedies the problem of misinterpreting the correct name location in the person names ordering.
       
  • Have you come across other variations that could be standardized? Please provide your invaluable comments below, or contact us regarding your thoughts. We will be happy to review them and include them to enhance this quality of this document.

[ Top ]

11. Order your fully custom-made Intellibe for your own needs

The online version of Intellibe is limited; it transcribes to a maximum of 165 characters. For a fuller, flash-fast and unlimited version of Intellibe professionally or academically to suit your organization needs, please do not hesitate to contact us to build for you your own customized version of Intellibe. Section 2 of this document mentions some applications where Intellibe could prove to be an indispensible tool to you or your organization.

[ Top ]