Intellaren

Intellibe: An Arabic to Latin text transcriber

This article exists in other translations [ Id: ar00005 عربي ], also accessible through the articles page.

👋👋👋 Try Anab 🍇, the Quran words retriever with amazing accuracy and speed. Here is an introductory 6 minute video, here is a detailed 30 minute video, and here is the app.

1. Introduction

2. Applications of Intellibe

3. Transcription x Transliteration, and the way of Intellibe

4. Transcription shortcomings without a computational parser

5. Towards an optimal solution: tashkeel (diacritics) to the rescue

6. DIN 31635 and DIN 31635': the quest for more vocalization accuracy

7. Rules and examples of how to use Intellibe

8. Evidence of the accuracy of Intellibe

9. Common transcription inaccuracies that Intellibe avoids

10. Common transcription variations that Intellibe standardizes

11. Order your fully custom-made Intellibe for your own needs

1. Introduction

Intellibe (Intellaren's Arabic-to-Latin text transcriber) is a pure computational transcriber. It transcribes Arabic input into Latin-based alphabet letters that provide the closest matching phonetic sound when vocalized. For example, when transcribing the name محمد , the Latin characters that would provide the closest phonetic sound when vocalizing it are those as arranged in "muhammad" or "mohammad", and not "mhmd" or "mohamed" for example.

This version of Intellibe uses English alphabet as the representative of Latin-derived languages. When using Intellibe's user interface (IUI), there are three outputs that are generated for each user input, each being displayed in its own compartment; they are described next.

Input: Intellark-enabled editor compartment for typing in Arabic. Intellibe's editor is Intellark-enabled, therefore, any of the letters below can be typed, transcribed and displayed in the three output compartments. You may learn about Intellark or take the one-hour Intellark-layout-tutorial that teaches all about the new and intuitive way to typing in Arabic. You may of course disable or enable Intellark (by typing Alt-L) to type using your preferred Arabic keyboard layout.
ا أ إ آ ء ب پ ت ة ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ڤ ق ك ل م ن ه و ؤ ي ى ئ

َ ً ُ ٌ ِ ٍ ْ ّ
Output 1: Plain English compartment. This compartment displays the resulting transcription in pure English letters. Although the result may seem uncommon to an English-reading eye (e.g., transcribing رامي as "raamee" instead of "rami"), it does map to English letters that capture as close as possible the phonetic sound of what's being transcribed.
Output 2: DIN 31635' compartment. This compartment shows the resulting transcription in DIN 31635'. Students and educators of the Arabic language in particular should find this map of extreme convenience to conveying in English what would otherwise be impossible to approximate vocally. In this compartment, رامي is transcribed to "rāmī". DIN 31635' is explained below.
Output 3: DIN 31635' in plain English compartment. This compartment scans over the above DIN 31635' output, converting each non-plain-English character into its most-resembling equivalent letter from the available 26 English letters, thereby producing familiar looking spellings to an English-reading eye. For example, "rāmī muḥammad" would be converted into "rami muhammad").

An online and light version of Intellibe for you to experiment with is available.

[ Top ]

2. Applications of Intellibe

Following is a list of applications where Intellibe would prove to be of extreme benefit for the purposes of Arabic-to-Latin transcription.

Person name transcription for use in official documents such as passports, credit cards, school and university ID cards...
Address, city, channel or company name transcription for accurate and standardized data exchange, logging and retrieval
General institution name transcription such as governmental, educational, or commercial places and that includes school and university names, airport and bank names and so on
Transcription for educational purposes for students of the Arabic language. Transcription of Arabic characters into English counterparts is no alternative to learning or teaching the vocal sounds of Arabic letters or words. Intellibe can speed up the learning process of Arabic vocalization for students not familiar with the "real" vocal sounds produced in Arabic speech. See this link for an example where transliteration/transcription for educational needs maybe needed. Intellibe may easily and robustly be used to produce strikingly accurate, standardized results with several output flavors.

Section 11 of this document mentions how you may order your own customized copy of Intellibe to cater for your needs.

[ Top ]

3. Transcription x Transliteration, and the way of Intellibe

Before proceeding in this section, let's define precisely what is meant by transcription and transliteration in order to justify Intellibe's choice of one over the other.

Transcription: A representation of speech sounds in phonetic symbols.
Transliteration: A representation of letters or words in the corresponding characters of another alphabet.

That is, whereas transcription is concerned more with the issue of representing text in one language by a set of characters or symbols in another language to be correctly vocalized, transliteration is concerned more with simply representing the characters of one language by matching characters in a different one. For example, consider the Arabic sentence أكـلـت جـزرة , which means "I ate a carrot" in English. Transliterating the eight-letter sentence gives this eight letter pair: aklt jzrt, which, although accurately maps each Arabic letters to an English counterpart, it is insufficient material for correct vocalization. Transcribing, on the other hand, takes liberty in adding and removing characters as needed to allow an alien to the destination language to vocalize the phrase correctly. Transcribing the Arabic sentence gives akaltu jazarah for example.

Another example. Consider the word الـنـار , which means "the fire". Letter for letter transliterating gives alnar, whereas transcription gives annaar -- the way it is vocalized in Arabic.

Intellibe, Intellaren's Arabic-to-Latin text transcriber, takes the more difficult, yet more useful and appealing approach -- transcription; that is, it takes liberty to add and remove letters as it sees fit to guide to correct vocalization of Arabic text for practical and official purposes.

[ Top ]

4. Transcription shortcomings without a computational parser

To highlight the problem, consider the way transcription of common personal details has been conducted so far, especially for official documents such as passports, certificates, electronic cards or more generally, the filling out of forms that require many details about the applicant. Transcription of client details from Arabic to English is typically accomplished by one of the following three methods:

By the applicant themselves, where they are required to fill in their own personal details. This leads to many inconsistencies and variations in transcribing even the simplest person or school name.
By the representative agent who may or may not be literate by the intricacies of the foreign language in which the details have to be transcribed into. Here, transcription quality and standard are at the mercy of the processing agent, and numerous varying transcriptions of the same noun or verb are expected.
By means of fetching transcriptions from a database repository or user-made dictionaries that are constantly being filled and maintained by organization personnel. Transcription here lacks standardization from one organization to the other, and quality and standard of transcription are still subject to user input as mentioned in the previous point.

Therefore, all of the aforementioned means of transcription lack a unifying, non-subjective transcription process. Some real and typical recurring examples inconsistent trascription follow.

Given these names: سفيان , رامي , عبد العزيز , يوسف , محسن , عبد الستار , عبد القادر , محمد , consider the following versions available for each single Arabian name. You may verify the presence of each name variant shown in the table below by searching for it using any search service provider such as google or intellaren.

	Name		Possible transcriptions/transliterations used
	محمد		mohamad, mohamed, mohammad, mohammed, muhamad, muhamed, muhammad, muhammed...
	عبد القادر		abdulkader, abdelkader, abdelkadir, abdul qadir, abd alkader, abd-ulkadir, 'abd'alkader...
	عبد الستار		abdulsattar, abd alsattar, abdussattar, abdelsatar, abdalsattaar...
	محسن		mohsen, mohsin, muhsin, muhsen, moohsen, mouhsen...
	يوسف		yoosuf, yousuf, yossef, youssif, yusuf, yusaf, yoosef, youssuf...
	عبد العزيز		abdelaziz, abdalaziz, abduaziz, abdol-aziz, abdol aziz, abdulazeez, abdulaziez, abdul'zeez, abd-alazeiz...
	رامي		rami, ramy, raami, raamy, ramey, ramee...
	سفيان		sofian, sofyan, sofyaan, soufiane, soufian...

In fact, and especially where:

names are prefixed with عبد
names are prefixed with the article ال , or more generally
names are muḍāf and muḍāf-ʾilayh ( المُضاف والمُضاف إلَيه , i.e., possessee and possessor, as in Adam's orange, or Kelowna's river), and may be transcribed as a single compound word in English (e.g., أبو بكر or شيخ الأرض )

the number of transcription results grows quite large. There are other problems where non-standardized transcription/transliteration leads to ambiguous outputs; namely:

The problem of transcribing long vowels as short vowels in English for example, thereby rendering nouns such as ملك , ملاك and مالَك to be all transcribed into one English string: malak for example.
The case of whether to transcribe the kasra tashkeel character (diacritic) into an e or an i in English, as portrayed by the names خالِد and ناهِد (e.g., khaled x khalid, and nahed x nahid).
The process of data retrieval is not straight forward as is intended to be. For example, whereas searching for employees with the first name عبد العريز in an Arabic populated database table is straight forward, searching for the same name when transcribed/transliterated in an English- or French-populated database table surely requires a lot of searches or some string preprocessing operations to hit a range of possibilities (e.g., look for all names who contain the letters "abdzz" within them and in order), thereby complicating a process that should have been simple and effective.

The following examples illustrate the aforementioned problems more clearly.

	Name		Possible transcriptions/transliterations used
	ملك		malak, malek, malik, malak...
	ملاك		malak, malaak...
	الجمل		algamal, aljamal, al gamal, al-jamal...
	جمال		gamal, jamal, gamaal, jamaal...
	عبد المالك		... malek, malik, maalik, maalek, maleek...
	عبد الملك		... malek, malik...
	خالد		khaled, khalid, khaalid, kaled...
	ناهد		nahed, nahid, naahed, naheed...
	إيناس		enas, eanas, inas, einas, enaas, einaas, eanas, eanaas, eenas...

The above table examples are by no means conclusive of the problem at hand when engaging into transcription tasks for official or non-official purposes.

In light of the aforementioned mechanisms and disadvantages associated with them, it is the objective of Intellibe's to standardize the process of transcription of Arabic text into Latin-descent languages accurately. Intellibe introduces solutions to all of the above problems as is described below. This current Intellibe version produces English-flavored transcription (as opposed to French or Spanish for example).

We stress again that Intellibe is a pure computational transcriber, meaning that its real-time produce solely depends on user input of Arabic letters and tashkeel characters (diacritics), character by character, to generate its output spontaneously. There is no associated database, dictionary or a word-repository associated with Intellibe except for some pre-programmed rules that are described later in this document. You may realize the validity of our statement by simply typing some Arabic input with tashkeel now inside the input compartment of IUI.

[ Top ]

5. Towards an optimal solution: tashkeel (diacritics) to the rescue

To the Arabic speaker, dialects aside, there is typically one way to pronouncing any of the names in Arabic mentioned earlier. For the non-Arabic speaker, ambiguities in pronouncing those names or Arabic text in general correctly are expected since there are no explicit tashkeel characters present to guide how text should be vocalized. To convey the correct way of vocalizing (and as a result, transcribing) the correct phonetic sound to non-Arabic speakers or to students of the Arabic language, tashkeel must be utilized. Diacritics in Arabic essentially act as short vowels do in English. We now rewrite the above list of names, only this time with both the diacritics and the most probable transcription to English according to this initial set of rules.

Tashkeel	Shape	Transcribed into...
fatḥa	َ	a
ḍamma	ُ	u or o
kasra	ِ	i (e is another weaker possibility)
sukūn	ْ	not transcribed into any vowel sound (silent sound)
šadda	ّ	geminates (i.e., duplicates) the letter it sits on
Vowel ʾalif	ا	aa
Vowel wāw	و	ou or oo
Vowel yāʾ	ي	ee

Applying the above set of rules results in the following transcriptions. Extra rules are mentioned locally next to the transcribed name in this table.

	Name		Accurate transcription
	مُحَمَّد		muhammad
	عَبدُ القادِر		abdulqaadir (notice the use of the lām-qamariyya in this name)
	عَبدُ السَتار		abdussattaar (notice the use of the lām-šamsiyya in this name)
	مُحسِن		mohsin
	يوسُف		yousof
	عَبدُ العَزيز		abdulazeez
	رامي		raamee
	سُفيان		sofyaan
	مَلَك		malak
	مَلاك		malaak
	الجَمَل		aljamal
	جَمال		jamaal
	عَبدُ المالِك		abdulmaalik
	عَبدُ الملِك		abdulmalik
	خالِد		khaalid
	ناهِد		naahid
	إيناس		eanaas (and not einaas or inas for example, so that the initial vocal sound is as found in each, eagle, ear, easy or eat; more on this particular transcription below)

All the above results are generated using Intellibe for the plain-English compartment as mentioned in Seciton 1 above. Therefore, with just a simple set of constraining rules, many factors that lead to multiple transcriptions are eliminated. We hope you understand the difference between lām-qamariyya and lām-šamsiyya; if not, let us know and we would publish a brief educational tutorial on them.

[ Top ]

6. DIN 31635 and DIN 31635': the quest for more vocalization accuracy

Intellibe will do its best in generating for the user the closest string of characters in English to match the sound of Arabic input. Unfortunately, not all letters in Arabic have a one-to-one mapping to letters in English, which means that many Arabic characters will have to be approximated by the closest sound generating character or combination of characters available from the characters of the transcribed-to language. To highlight the problem, we show how transcribing words with the letters in the following table leads to many versions of the same word.

	Letter		Potential transcriptions/transliterations
	ث		th (as in thank and thrill, not as in then or those)
	ح		h (as a result, the names حامد and هامد are both transcribed to haamid)
	ذ		th (as in then or those, not as in thank or thrill)
	ص		s (or c when followed by an e, an i or a y), as a result, the verbs سار and صار are both transcribed to saara
	ض		d or dh (as a result, the verbs دلّ and ضلّ would both be transcribed to dalla)
	ط		t (as a result, the names طالوت , طارق and تارك are transcribed as taaloot and taariq, and taarik)
	ظ		dh, z, zh or th (as a result, numerous versions of names containing ظ are produced)
	ع		a, e, i, o, u, depending on the tashkeel used with it, but the inaccuracy in approximating the phonetic sound remains

To remedy such incapacities inherent in the transcription process, Intellibe resorts to DIN 31635, a standard for the transliteration of the Arabic alphabet that was adopted in 1982. Students and educators of the Arabic language in particular should find this map of extreme convenience to conveying in English what would otherwise be impossible to vocally simulate for correct pronunciation. For convenience, the DIN 31635 map is is reproduced below in Table 1.

**Table 1:** DIN 315635 map
ﻱ	ﻭ	ه	ﻥ	ﻡ	ﻝ	ﻙ	ﻕ	ﻑ	ﻍ	ﻉ	ﻅ	ﻁ	ﺽ	ﺹ	ﺵ	ﺱ	ﺯ	ﺭ	ﺫ	ﺩ	ﺥ	ﺡ	ﺝ	ﺙ	ﺕ	ﺏ	ﺍ	Letters
ī\y	ū\w	h	n	m	l	k	q	f	ġ	ʿ	ḍ	ṭ	ḍ	ṣ	š	s	z	r	ḏ	d	ḫ	ḥ	ǧ	ṯ	t	b	ʾ/ā	DIN 31635

To maximize knowledge transfer and usage of the 26 English letters, the following set of rules and adjustments are added and adopted to complement DIN 31635 for the purposes of transcribing using Intellibe.

	Letter		Transcription rules and adjustments
	پ-p		Letter is added for more accurate transliteration purposes
	ڤ-v		Letter is added for more accurate transliteration purposes
	ج		Letter is transcribed into j instead of ǧ to maximize reuse of plain English letters
	خ		Letter is transcribed into ḳ instead of ḫ since k is the first letter in the typical representing digraph kh
	ع		Letter is transcribed into the symbol ʿ followed by a, o or i, depending on whether the diacritic used to guide its pronunciation is a fatḥa, ḍamma or kasra, respectively

The results of the above alterations are summed up and are incorporated into a slightly variant standard and is exhibited in Table 2, henceforth called DIN 31635'.

**Table 2:** DIN 315635' map
ﻱ	ﻭ	ه	ﻥ	ﻡ	ﻝ	ﻙ	ﻕ	ڤ	ﻑ	ﻍ	ﻉ	ﻅ	ﻁ	ﺽ	ﺹ	ﺵ	ﺱ	ﺯ	ﺭ	ﺫ	ﺩ	ﺥ	ﺡ	ﺝ	ﺙ	ﺕ	پ	ﺏ	ﺍ	Letters
ī\y	ū\w	h	n	m	l	k	q	v	f	ġ	ʿ	ḍ	ṭ	ḍ	ṣ	š	s	z	r	ḏ	d	ḳ	ḥ	j	ṯ	t	p	b	ʾ\ā\a	DIN 31635'

We highly recommend the adoption of this standard for transcription or transliteration purposes especially for educational purposes; some advantages are enumerated next.

Transcribed words with geminated letters is clearer to read and produces smaller size strings. Consider for example plain transcription of names like الظاهر or الشافي that are prefixed with the article ال where lām is silent (لام شمسية). In plain English alphabet, these names would be transcribed to abdudhdhaahir and abdushshaafee, whereas by using the DIN standard, the transcription would result into ʿabduḓḓāhir and ʿabduššāfī.
Long vowels will not need more than one letter to transcribe to, hence smaller size strings. For example, whereas transcribing قَيّوم , بَصير and وَهّاب in plain English results in qayyoum, baseer and wahhaab, using the DIN standard gives baṣīr, qayyūm, and wahhāb.
Transcription of the letters ث, ذ, ز and ظ , or the letters ت and ط is easily distinguishable since each has its own representing letter.

[ Top ]

7. Rules and examples of how to use Intellibe

Intellibe does it best in generating in English a string that closely resembles the Arabic input when vocalized, as based on the input letters and diacritics. Numerous examples are provided in Table 3 on user input and Intellibe outputs. In the table, the following applies.

The character/string to be transcribed	Symbols	Rule or action taken...
fatḥa (فَتحة )	َ	Transcribed into the owel a.
ḍamma (ضَمّة )	ُ	Transcribed into the vowel u (vowel o is an alternative provided by IUI)
kasra (كَسرة )	ِ	Transcribed into the vowel i (vowel e is not provided as an alternative; see justification in Section 9)
sukūn (سُكون )	ْ	Not transcribed into any vowel, it may therefore be omitted without compromising the output as in عـبـدُ instead of عـبْـدُ . It should be used only to override defaults embedded in Intellibe or in Arabic in general. For example, whereas قَوي transcribes into qawī, قَويْ generates qawy
šadda (شَدّة )	ّ	Geminates (i.e., duplicates) the letter it sits on. Intellibe will insert a fatḥa for you over the šadda , but this insertion maybe explicitly overridden as explained in the point above with unsolicited tashkeel insertions
Tashkeel (diacritics) in general		Generally, a minimum amount of tashkeel is needed. For example, there is no need to put a kasra before the long-vowel yāʾ, or a fatḥa before a tāʾ marbūṭa, Intellibe will fill those in for you
lām-qamariyya and lām-šamsiyya	ال	There is no need to explicitly provide tashkeel characters on letters preceded by lām-qamariyya or lām-šamsiyya during transcription ( اللام القَمَرِيّة واللام الشَمسِيّة ), Intellibe already knows how to accurately transcribe words prefixed with the article ʾalif-lām.
The dot	.	Inserting a dot (without space) between two words turns them into muḍāf and muḍāf-ʾilayh ( المُضاف والمُضاف إلَيه , similar to possessee and possessor, respectively) and produces one combining string. If the muḍāf ends with no diacritic, Intellibe will fill in a ḍamma on your behalf (thereby exploiting the rules of Arabic grammar); if you don't need the inserted ḍamma, you may explicitly put a sukūn or a fatḥa or a kasra instead
Names starting with the common prefix عبد		For names starting with the common prefix عبد , you may either insert a dot as in عبد.القادِر , or simply concatenate the compound words with no space in-between as in عبدالقادِر , both inputs will generate the desired result ʿabdulqādir
The hyphen	-	Inserting a hyphen (without surrounding spaces) between words hyphenates them, yet while maintaining lām-qamariyya and lām-šamsiyya rules of transcription should the muḍāf-ʾilayh begin with an ʾalif-lām. For example, the name نور-الـهُـدى is transcribed as nur-alhuda; that said, a better transcription would be nurulhuda, which may be obtained by writing in Arabic the name نور.الـهُـدى (see the previous point for how the dot is interpreted)
The substring تش		The presence of the substring تش is transcribed into ch. For example, the word تشيلّي is transcribed to chīllī instead of tshīllī
The substring كس		The presence of the substring كس is transcribed into x. For example, the word مَكسيكي is transcribed to maxīkī instead of maksīkī
Punctuation symbols		Punctuation symbols are correctly displayed and even direction-flipped when applicable
Articles of reference or conjunction		Some special articles are dealt with as expected, so it is sufficient to write بِالقَرية, التي, الذي or فالوَلَد to produce the desired transcription

Table 3 below provides numerous examples of input and output. In the table, the following five preferences are assumed (these preferences can be set through the selection buttons available on Intellibe's user interface (IUI)).

	Preference on IUI		Option set in the examples
	ḍamma		Transcribed into u (as opposed to o)
	wāw-madd		Transcribed into ou (as opposed to oo)
	tāʾ-marbūṭa		When without tashkeel, it is ignored (the other option is to transcribe it into an h)
	fatḥa		A fatḥa is put on first ḥarf (letter) by default, so typing برد and بَرد renders the same results
	Apostrophes		Apostrophes are kept by default, so inputting لؤلؤ gives lu'lu'; removing the apostrophe gives lulu

**Table 3:**Examples of user input and Intellibe outputs.
DIN 31635' in Plain English	DIN 31635'	Plain English	Arabic input	#
assalamu alaykum warahmatu Allahi wabarakatuh, ahlan wamarhaban bikum; dauna nabda'	assalāmu ʿalaykum waraḥmatu Allahi wabarakātuh, ʾahlan wamarḥaban bikum; daʿūnā nabdaʾ	assalaamu 'alaykum warahmatu Allahi wabarakaatuh, 'ahlan wamarhaban bikum; da'ounaa nabda'	السَلامُ علَيكُم ورَحمَةُ اللهِ وبَرَكاتُه، أهلاً ومَرحَباً بِكُم؛ دعونا نبدَأ	1
muhammad mhmud hmdi ahmad ahmad	muḥammad mḥmūd ḥmdī ʾaḥmad aḥmad	muhammad mhmoud hmdee 'ahmad ahmad	مُحَمَّد محمود حمدي أحمَد احمَد	2
yusuf rami hani shawqi bani bani	yūsuf rāmī hānī šawqī banī bānī	yousuf raamee haanee shawqee banee baanee	يُوسُف رامي هاني شوقي بني باني	3
nurulhuda abdulqadir abdussattar	nūrulhudā ʿabdulqādir ʿabdussattār	nourulhudaa 'abdulqaadir 'abdussattaar	نور.الهُدى عبد.القادِر عبدالسَتّار	4
nuha shaykh-ala'rd abubakr abu-bakr abulqasim	nuhā šayḳ-alaʾrḍ abūbakr abū-bakr abūlqāsim	nuhaa shaykh-ala'rd aboubakr abou-bakr aboulqaasim	نُهى شيخ-الأرض ابوبَكر ابو-بَكر ابو.القاسِم	5
Allahu la ilaha illa huwa alhayyu alqayyumu la ta'khuzhuhu sinatun wala nawmun lahu ma fi assamawati wama fi al'ardi	Allahu lā ʾilāha ʾillā huwa alḥayyu alqayyūmu lā taʾḳuḏuhu sinatun walā nawmun lahu mā fī assamāwāti wamā fī alʾarḍi	Allahu laa 'ilaaha 'illaa huwa alhayyu alqayyoumu laa ta'khuzhuhu sinatun walaa nawmun lahu maa fee assamaawaati wamaa fee al'ardi	اللَّهُ لَا إِلَٰهَ إِلَّا هُوَ الْحَيُّ الْقَيُّومُ لَا تَأْخُذُهُ سِنَةٌ وَلَا نَوْمٌ لَهُ مَا فِي السَّمَاوَاتِ وَمَا فِي الْأَرْضِ	6
- walasri - inna al'insana lafi khusrin - illa allazhina amanu waamilu assalihati watawasaw bilhaqqi watawasaw bissabri -	- wālʿaṣri - ʾinna alʾinsāna lafī ḳusrin - ʾillā allaḏīna ʾāmanū waʿamilū aṣṣāliḥāti watawāṣaw bilḥaqqi watawāṣaw biṣṣabri -	- waal'asri - 'inna al'insaana lafee khusrin - 'illaa allazheena 'aamanou wa'amilou assaalihaati watawaasaw bilhaqqi watawaasaw bissabri -/td>	- وَالْعَصْرِ - إِنَّ الْإِنْسَانَ لَفِي خُسْرٍ - إِلَّا الَّذِينَ آمَنُوا وَعَمِلُوا الصَّالِحَاتِ وَتَوَاصَوْا بِالْحَقِّ وَتَوَاصَوْا بِالصَّبْرِ -	7
malak malak mullak mulk milk mullika mulika	malak malāk mullāk mulk milk mullika mulika	malak malaak mullaak mulk milk mullika mulika	ملَك ملاك مُلّاك مُلك مِلك مُلِّكَ مُلِكَ	8
al'iskandariyyatu madinatun misriyyatun taqau ala albahri al'abyadi almutawassit, alaysa kazhalik?	alʾiskandariyyatu madīnatun miṣriyyatun taqaʿu ʿalā albaḥri alʾabyaḍi almutawassiṭ, ʾalaysa kaḏalik?	al'iskandariyyatu madeenatun misriyyatun taqa'u 'alaa albahri al'abyadi almutawassit, 'alaysa kazhalik?	الإسكَندَرِيّةُ مدينةٌ مِصرِيّةٌ تقَعُ على البحرِ الأبيَضِ المُتَوَسِّط، ألَيسَ كذَلِك؟	9
chilli min akalati almaxiki alharra	chīllī min ʾakalāti almaxīki alḥārra	cheellee min 'akalaati almaxeeki alhaarra	تْشيلّي مِن أكَلاتِ المكسيكِ الحارّة	10
samir samir samar samar simsim simsim samasim simsimiyya	sāmir samīr samar samār simsim simsim samāsim simsimiyya	saamir sameer samar samaar simsim simsim samaasim simsimiyya	سامِر سمير سمَر سمار سِمْسِمْ سِمسِم سماسِم سِمسِمِيّة	11
eanas anas eaman amin ayman aysar mu'min alkhurasani	eanās ʾanas eamān ʾamīn ʾayman ʾaysar muʾmin alḳurāsānī	eanaas 'anas eamaan 'ameen 'ayman 'aysar mu'min alkhuraasaanee	إيناس أنَس إيمان أمين أيمَن أيسَر مُؤمِن الخُراساني	12
abdullah abdulaziz abdulquddus abdushshafi abdulqadir abdurrahman abdurrahim abd-assalam abdulmuhaymin	ʿabdullah ʿabdulʿazīz ʿabdulquddūs ʿabduššāfī ʿabdulqādir ʿabdurraḥmān ʿabdurraḥīm ʿabd-assalām ʿabdulmuhaymin	'abdullah 'abdul'azeez 'abdulquddous 'abdushshaafee 'abdulqaadir 'abdurrahmaan 'abdurraheem 'abd-assalaam 'abdulmuhaymin	عبد.الله عبدالعَزيز عبد.القُدّوس عبد.الشافي عبد.القادِر عبد.الرحمٰن عبد.الرحيم عبد-السلام عبد.المُهَيمِن	13
muhammadun abnu abdullah attanjiyyi, orifa bibni battuta, min murrakish, rahhalatun wamu'arrikhun waqadi wafaqihun maghribiyy	muḥammadun abnu ʿabdullah aṭṭanjiyyi, ʿurifa bibni baṭṭūṭa, min murrākiš, raḥḥālatun wamuʾarriḳun waqāḍī wafaqīhun maġribiyy	muhammadun abnu 'abdullah attanjiyyi, 'urifa bibni battouta, min murraakish, rahhaalatun wamu'arrikhun waqaadee wafaqeehun maghribiyy	مُحَمَّدٌ ابنُ عبد.الله الطنجِيِّ، عُرِفَ بِابنِ بطّوطة، مِن مرّاكِش، رحّالةٌ ومُؤَرِّخٌ وقاضي وفَقيهٌ مغرِبِيّ	14
watara almala'ikata haffina min hawli alarshi yusabbihuna bihamdi rabbihim waqudiya baynahum bilhaqqi waqila alhamdu lillahi rabbi alalamina	watarā almalāʾikata ḥāffīna min ḥawli alʿarši yusabbiḥūna biḥamdi rabbihim waquḍiya baynahum bilḥaqqi waqīla alḥamdu lillahi rabbi alʿālamīna	wataraa almalaa'ikata haaffeena min hawli al'arshi yusabbihouna bihamdi rabbihim waqudiya baynahum bilhaqqi waqeela alhamdu lillahi rabbi al'aalameena	وَتَرَى الْمَلَائِكَةَ حَافِّينَ مِنْ حَوْلِ الْعَرْشِ يُسَبِّحُونَ بِحَمْدِ رَبِّهِمْ وَقُضِيَ بَيْنَهُمْ بِالْحَقِّ وَقِيلَ الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ	15

[ Top ]

8. Evidence of the accuracy of Intellibe

By stating that Intellibe is a "computational transcriber", we mean that Intellibe does not depend on any existing dictionaries or databases that are pre-filled or strings that are preprocessed; rather, Intellibe parses and transcribes its input as a function of the letters and the diacritics that are used to write text in Arabic; in fact, the use of diacritics (taškīl, or ḥarakāt) is essential and would prove to be an integral part to a meaningful and accurate transcription. When an Arabic string is fed to the Intellibe parser, the Arabic letters are mapped to English letters that are sound-equivalent , while encountered diacritics are transformed into the appropriate combination of English vowels to guide to correct pronunciation (see the Intellark tutorial page for a comprehensive study of the equivalency map details).

The following two examples show Internet search results for a couple of common Arabian names, and how Intellibe, depending solely on its computational algorithms, arrives to the mostly used transcription of such names. Search and data collection were conducted on December 20, 2010 using google.com. Accurate statistics about the number of occurrences of a search string are obtained by double quoting the search string, as in "rami" or "yousuf" for example.

Example 1: Transcribing the name رامي

Using Intellibe, the name رامي is first transcribed in Output 1 of the IUI (the plain English compartment) as "raamee", which when pronounced would sound the closest to the way it's vocalized in Arabic; yet, it is not a favored way for a رامي to dictate their name in English. In DIN 31635' in Output 2, رامي is transcribed to rāmī which although i) also a perfect match to the phonetic sound of رامي in Arabic, and ii) appropriate for educational purposes, there is a need to convert the non pure English letters into English for use in official documents or digital data exchange. Finally, the third phases in Intellibe scans over rāmī character by character and replaces non-English letters by their closest match from the 26 English alphabet letters; that turns rāmī into rami, which results into the most common way when transcribing رامي to English. Table 4 below gives many existing variants of a رامي transcription into English, together with statistics about the number of hits of each variant that results from a simple google search.

**Table 4:** Statistics on the different variants رامي is transcribed to English.
	#-of-matches-based sort		name-based sort
	رامي	# of matches	رامي	# of matches
1	rami	12,200,000	raame	102,000
2	ramy	7,130,000	raamee	56,500
3	ramee	555,000	raamey	7,260
4	raami	192,000	raami	192,000
5	raame	102,000	raamy	15,500
6	raamee	56,500	ramee	555,000
7	ramiy	20,800	rami	12,200,000
8	raamy	15,500	ramiy	20,800
9	raamey	7,260	ramy	7,130,000

Example 2: Transcribing the name يوسف

Feeding يوسُف to Intellibe gives the following results: i) "yousuf" for Output Area 1, ii) "yūsuf" in DIN 31635' in Output Area 2, and iii) "yusuf" when transforming the DIN output into plain English in Output Area 3. Table 5 shows the statistics collected from a google search on the different variants of يوسُف.

**Table 5:** Statistics of the different ways يوسُف is transcribed into English.
	#-of-matches-based sort		name-based sort
	يوسُف	# of matches	يوسُف	# of matches
1	yusuf	34,800,000	yoosef	97,200
2	yosef	9,870,000	yoosof	38,600
3	yousef	7,880,000	yoosuf	246,000
4	yousuf	4,310,000	yosef	9,870,000
5	yusif	3,680,000	yosif	493,000
6	youcef	1,590,000	yosof	23,900
7	yousif	493,000	yossef	369,000
8	yosif	493,000	yossif	179,000
9	yussuf	450,000	yossuf	32,900
10	yussof	400,000	yosuf	74,400
11	yossef	369,000	youcef	1,590,000
12	yoosuf	246,000	yousef	7,880,000
13	yussef	215,000	yousif	493,000
14	yossif	179,000	youssaf	7,730
15	yoosef	97,200	yousuf	4,310,000
16	yosuf	74,400	yucef	17,900
17	yussif	72,100	yusif	3,680,000
18	yoosof	38,600	yussef	215,000
19	yossuf	32,900	yussif	72,100
20	yosof	23,900	yussof	400,000
21	yucef	17,900	yussuf	450,000
22	youssaf	7,730	yusuf	34,800,000

Statistics shown in the above two representative examples prove that:

The majority of people usually go for the "correct" way their name ought to be represented in English
People like their names to be assigned the shortest number of characters that represent their name
People like their names represented by what may seem common spellings, so the substring aa, although phonetically more accurate for transcribing an Arabic ʾalif-madd ( ألِف-مَدّ ), it is not usually favored
For each single name, there are numerous variants that should have been mapped to a single representative name

Intellibe, although a computational transcriber (i.e., not pre-loaded with databases of transcribed words), shows that it does have the capability of producing very pleasant and accurate transcription results while working under the aforementioned constraints; see the first row of each table to make sure of this.

[ Top ]

9. Common transcription inaccuracies that Intellibe avoids

This section presents common Arabic-to-English transcription inaccuracies, and how Intellibe is programmed to step out of them efficiently.

There are numerous transcription inaccuracies that although may seem appropriate at the time of transcription, they lead to Arabic names or text in general being inaccurately pronounced when a corresponding English string is vocalized. An effort to list these common inaccuracies follows.

Inaccuracy 1. Transcription of ʾalif-madd into a short vowel like that produced by the fatḥa. For example, the verbs كَتَبَ and كاتَبَ , and the names مَلَك and مَلاك are typically transcribed into the same word, resulting in kataba for the first pair, and malak for the second pair. An accurate transcription adds an extra "a" where a long vowel is used, which would result in kaataba and malaak where appropriate. Intellibe would produce accurate results in the Plain-English and the DIN 31635' compartments, but will get rid of the extra "a" in the third compartment (DIN in Plain English) in favor of producing common-looking English strings.
Inaccuracy 2. Transcription of kasra to "e" instead of "i". In words that rhyme with the word فاعِل in Arabic ( fāʿil ) as in سالم , صابر , والد and تامر , the kasra diacritic is typically transcribed into the vowel e, resulting into to following transcriptions: salem, saber, waled and tamer. But when these names are pronounced, they usually rhyme with words such as taker, baker or paled. A more accurate transcription where turns the kasra into an i, resulting into the following more corresponding transcriptions: salim, sabir, walid and tamir.
Inaccuracy 3. Transcription of Arabic words that begin with إي with the letters e, i, or ei. For example, names like إيناس and إيمان are typically transcribed into enas / eman, inas / iman, einas / eiman; however, prefixing with e, i, or ei usually leads to the names being pronounced as in end, echo, eddy, edit, else, epic, ever, enact, or as in ibis, icon, idea, idle, idol, iron, or as in eigenvector, eight, Einstein and either. To accurately capture the vocal sound of such names with the mentioned Arabic prefix, empirical observations steer us into prefixing "ea", thereby resulting in the more intended initial vocalization as found in the names of this list: each, eager, eagle, ear, ease, east, eat and eave.
Inaccuracy 4. Have you faced other inaccuracies that should be mentioned and avoided? Please provide your invaluable comments below, or contact us regarding your concerns. We will be happy to review them and include them here to enhance this quality of this document.

[ Top ]

10. Common transcription variations that Intellibe standardizes

It is clear and intuitive that there are many variations that transpire when transcribing any single name. The intention here is to begin offering some standards that if adopted, unity and optimality in transcribing Arabian names and data (or text in general) into English is a step closer to accomplishment. In what follows an effort to list such standards.

Transcription of عبد-prefixed (abd-) names. Among the most common Arabian masculine names are those prefixed with عبد , or written as عَبدُ when decorated with tashkeel characters (diacritics), however, there are numerous variations that exist for each عبد-prefixed name, which add confusion when vocalized or treated by non-Arabic speakers. For example, the following prefixing for any transcribed عبد-prefixed name are so common:

          abd, abdo, abdu, abda, abde, abdi, abdl, abdol,
          abdool, abdul, abdoul, abdal, abdel, abdil, abd-,
          abdu-, abdul-, abd-al-, abd al a, 'abd 'u, 'abd'al...

In fact, the number of possibilities for any of the عبد-prefixed names may easily be in the order of tens upon tens of variants. The following context free grammar expression (CFG) generates but a subset of all the possible variants that exist if searched for in the Internet, and therefore the expression and the numerous variants it produces stand as evidence on the lack of a constraining standard that is required during transcribing Arabic to Latin-derived language at the very least.

عبد-prefixed names	=	[S₁] ( abd [ <S₂> ] ) [ <E> <S₂> ] ( <N₉₉> )
S₁	=	' \| λ
S₂	=	⃞ \| - \| ' \| λ
E	=	<V>^1,2 [ L ]
V	=	a \| e \| i \| o \| u
L	=	l \| <LShL> \| <LShL₂> , (note that the first option is the Letter l, not the number 1)
LShL	=	any of the lām-šamsiyya letters: t \| th \| d \| dh \| r \| z \| s \| sh \| n
		(in Arabic, these are the letters ت ث د ذ ر ز س ش ص ض ط ظ ن )
LShL₂	=	any of the LShL's geminated with a possible separator in-between:
		t[S₃]t \| th[S₃]th \| d[S₃]d \| dh[S₃]dh \| r[S₃]r \| z[S₃]z \| s[S₃]s \| sh[S₃]sh \| n[S₃]n
S₃	=	⃞ \| - \| λ
N₉₉	=	Any of the 99 names of Allah stripped of the prefix عبد ال

A legend for the above definitions follows.

Symbol Description

E An expression that generates constrained results

N₉₉ Any of the 99 names of Allah stripped of the prefix عبد ال

S A special character that evaluates to a space ( ⃞ ), a hyphen ( - ), an apostrophe ( ' ), or lambda, denoting an empty character ( λ )

V A vowel letter

() Contained sub-expression evaluation is required

[] Contained sub-expression evaluation is optional

| This vertical bar separates among the options to be chosen from (for example, V above must evaluate to one of the values it describes)

<...> Sub-expressions enclosed in these symbols are to be evaluated to concrete values

^1,2 The exponent ^1,2 implies that the base expression is selected once or twice

Possessor and possessee names. Although names made out of a possessor-possessee pair (muḍāf and muḍāf-ʾilayh) are understood in Arabic to refer to a single named person in a hierarch of names (e.g., there are three referenced persons in the first, middle and last name of a person), they are usually subject to being misplaced in the name hierarchy when transcribed into English, thereby incorrectly interpreting a person's full name in official documents for example. Some examples are highlighted next.
- The following names should be treated as a single unit: أبو بكر , أبو ظاهر , أبو القاسم , حسيب الله , yet, if transcribed into English as two isolated strings separated by space each as in ʾabū bakr, ʾabū ḓāhir, ʾabū alqāsim, and ḥasīb Allah, they lead to incorrect interpretation the second part of each pair as if it was a middle name, a father's name or as a last name. Using Intellibe for standardization purposes, a dot would be inserted (as in أبو.بكر, see Section 7) between every pair to result in a single name, they way each name in this class should be transcribed. In DIN 31635', these names would transcribe into ʾabūbakr, ʾabūḓāhir, ʾabūlqāsim, and ḥasībullah. In DIN 31635' in plain English, the names should and would transcribe into: abubakr, abudhahir, abulqasim, and hasibullah.
- Like the names in the previous point, the names mentioned in this point do refer to a single name, yet they would be too long if transcribed into a single name, but also should not be left with an intermediate space to prevent a two-name interpretation as mentioned in the previous point. Consider these: شَيخُ الأَرض , فاطِمةُ الزَهراء , زَهرةُ العُلا. Feeding Intellibe with these names gives the following: šayḳu alʾarḍ, fāṭimatu azzahrāʾ, and zahratu alʿulā.
- Inserting a dot between the elements of each pair gives: šayḳulaʾarḍ, fāṭimatuzzahrāʾ, and zahratulʿulā.
- Inserting a hyphen in-between gives: šayḳu-alʾarḍ , fāṭimatu-azzahrāʾ, zahratu-alʿulā.
- Any of the latter two solutions remedies the problem of misinterpreting the correct name location in the person names ordering.
Have you come across other variations that could be standardized? Please provide your invaluable comments below, or contact us regarding your thoughts. We will be happy to review them and include them to enhance this quality of this document.

[ Top ]

11. Order your fully custom-made Intellibe for your own needs

The online version of Intellibe is limited; it transcribes to a maximum of 165 characters. For a fuller, flash-fast and unlimited version of Intellibe professionally or academically to suit your organization needs, please do not hesitate to contact us to build for you your own customized version of Intellibe. Section 2 of this document mentions some applications where Intellibe could prove to be an indispensible tool to you or your organization.

[ Top ]

Comments (0) Add Comment