This article exists in other translations [ Id: ar00006 عربي ], also accessible through the articles page.
Intellyze 3.0 analyzes text and calculates frequency of Arabic letters and words, all else is dismissed. Letters supported by Intellyze 3.0 are:
ا أ إ آ ء ب ت ة ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك ل م ن هـ و ؤ ي ى ئ
Words include these two letters also: پ and ڤ. Text input areas of Intellyze are supported by Intellark, Intellaren's new Arabic keyboard layout. To analyze text that contains hundreds of thousands of words, which in turn contain five times as many letters, use Intellyze which is built specifically for that.
In addition to this Intellyze User Guide, there are several supporting videos that show Intellyze-in-action.
During text analysis, Intellyze produces several tables that contain statistics such as number of unique words, total number of letters and words, letter frequency, word frequency, and letter histograms that can sorted using alphabetic ordering or frequency of letters. Figure 1 shows a snapshot of Intellyze at work where we can see the main command panel at top right, several tables at top left, and four tabs at the bottom.
|
Figure 1 |
|
|
|
Translation of text shown in text-area of Figure 1
Hello and welcome -
Intellyze, or انتلايز , is designed to perform frequency analysis on Arabic letters and words. Intellyze seeks to be the analyst's oasis.
Intellyze is supported with Intellark keyboard layout ( انتلارك ). Type the letters a, s, d now to get the word اسد ; more in the Intellark tab below.
Thanks for reading...
|
|
Intellyze comes with fresh ideas for some if its icon images; they are explained next.
Intellyze provides more functions that meets the eye. In addition to some of the statistical operations unprecedented in any other software application on Arabic language, Intellyze helps you to easily:
-
copy analysis output to other applications
-
search within text using advanced operations
-
search the Internet
- analyze frequency of a word that may come in numerous forms (root and branches)
-
apply replace operations while picking from a list that's dynamically provided (see Tool 16 below)
Below is a listing of Intellyze functions to help text analysts. Functions are numbered for ease of future reference.
The upper pane shown in Figure 2 is the place where most of the commands for analyzing and modifying text in the text area take place.
Figure 2
|
|
The pane contains different tools that carry out different type of operations; they are explained in the following table.
|
Number |
|
Type |
|
Description of expected behavior |
|
1 |
|
single action |
|
each press carries out same operation |
|
2 |
|
on/off |
|
these tools are in one of two states: on or off |
|
3 |
|
text entry |
|
areas where users input text |
|
4 |
|
data display |
|
areas where data are displayed |
The following table provides a description for each tool in this order: tool number, name, icon, type as shown in the above table, short-cut keys if exist, and finally a description of the tool function.
|
Number
|
Tool
|
Icon
|
Type
|
Short-cut
|
|
Description
|
|
1)
|
analyze frequency
|
|
1
|
Control Enter
|
|
Count frequency of letters and words, produce graphical histogram for letters
|
|
2)
|
sort
|
|
1
|
Control R
|
|
Sort histogram letter bars, each press switches between sort on alphabetical order or on frequency
|
|
3)
|
clear
|
|
1
|
Control N
|
|
Clear text area, tables and histogram
|
|
4)
|
undo
|
|
1
|
Control Z
|
|
Undo last operation in text area
|
|
5)
|
redo
|
|
1
|
Control Y
|
|
Redo last undone operation in text area
|
|
6)
|
alif-lam
|
|
2
|
- - -
|
|
During search on words without the article "ال", the article "ال" is considered part of the word, and when searching for words that begin with the article "ال", words without the article "ال" are also matched. For example, when searching for the word "الترتيب", the word "ترتيب" is also matched
|
|
7)
|
all hamzas equal
|
|
2
|
- - -
|
|
During searching, the letter alif and its modified forms with hamza (i.e., ا، أ، إ، آ، ء و ٰ ) are considered the same. For example, searching for the word "أكثر" or "اكثر" is considered the same
|
|
8)
|
diacritics
|
|
2
|
- - -
|
|
During search, diacritic symbols are considered part of what has to be matched. For example, searching for the word "زُر" does not match with "زِر" or "زر"
|
|
9)
|
word root
|
|
2
|
- - -
|
|
During searching, search word is considered to be a root word. For example, searching for the word "رتب" matches with all words that contain the letters "ر" ,"ت" and "ب" (as in the words ترتيب , مرتب , and الترتيبية ), and in the given order, so the word "ربت" does not match
|
|
10)
|
advanced search
|
|
2
|
- - -
|
|
During search, any of wild characters "*", "+", or "؟" may be inserted between letters to look for zero or more, one or more, or zero or one matching letters. Here are three example:
-
searching for "ر*ب" matches words that begin with the letter "ر", zero or more in-between letters, followed by "ب"
-
searching for "ر+ب" matches words that begin with the letter "ر", one or more in-between letters, followed by "ب"
-
searching for "ر؟ب" matches words that begin with the letter "ر", zero or one letter in between, followed by "ب"
|
|
11)
|
whole word
|
|
2
|
|
|
During search, match whole words only (i.e. those surrounded by spaces, punctuation marks, end of line...) and not as part of other words. For example, search for the word "كل" does not match any of
"كلمة" , "كلمات" or "كليلة"
|
|
12)
|
backward search
|
|
1
|
F4 or Shift F3
|
|
Search backward |
|
13)
|
search field
|
|
3
|
F3 or Control F
|
|
Focus is brought to search field, and search begins in forward direction
|
|
14)
|
number of matches
|
|
4
|
- - -
|
|
Number of search word matches; note that this number is dependent on the six search parameters exampled in Functions 6 to 11 above
|
|
15)
|
forward search
|
|
1
|
F3
|
|
Search forward
|
|
16)
|
replacement
|
|
1
|
Control B
|
|
This tool provides hyper search (i.e., search begins just as keys are being pressed), and as text replaces from a list that you provide. Figure 3 shows the dialog box that takes care of such interaction and it contains:
- a search field for the word to be replaced or sought for
- two buttons for forward and backward searches
- a text field for the replacement word
- a green-tick button for executing the replacement operations
If a replace-all operation is desired, tick the "إستبدال شامل" checkbox at the bottom of the dialog box.
But, what if you had more than one word in mind to replace the search word with? In this case all you have to do is add such a list of potential replace-with words using the"أو بـ٠٠٠؟" (or with) button, then replace the search word with any of the replacement words when stopping over matched words in the text. This is shown in figures 4 and 5. In Figure 4, the word "ملك" is replaced with any of the words shown in the list. Figure 5 shows how putting diacritics on last letter of words may be accomplished:
- insert a space character in the search field, that triggers a search for end of words, then
- add a diacritic character in the Replace-With or Or-With fields you provide below such as " ُ "; that is a damma diacritic followed by a space character, or " َ" which is a fat-ha diacritic followed by a space character...
During search, the search tool will stop at every word that is followed by space, and all you have to do is to select which of the replace-with words you would like to replace the search word with.
Finally, note that you may dispose of any extra or-with field simply by pressing the remove button, exhibited by the red-x button
|
|
17)
|
Internet search
|
|
1
|
Control I
|
|
Searches for the word entered in the search field (see Tool 13 above) using the Intellaren search page
|
|
18)
|
new
|
|
1
|
Control Shift N
|
|
Open new file after properly closing current one if exists
|
|
19)
|
open
|
|
1
|
Control O
|
|
Open files. Note that you can also drag files directly into the text area
|
|
20)
|
save
|
|
1
|
Control S
|
|
Save files. Intellyze will provide the extension "txt" unless another extension is provided by the user
|
|
21)
|
save as
|
|
1
|
Control Shift S
|
|
Save files under a different name
|
|
Figure 3
|
|
|
|
Figure 4
|
Figure 5 |
|
|
|
The text area comes with the typical expected functions as shown in Figure 6. Following is a listing of such functions.
|
Figure 6 |
|
|
|
Translation of text shown in text-area of Figure 6
In this text area, you may use the Intellark keyboard layout for typing, whatever your system allows, or in English.
In addition to opening files using the Open button or through the command Control O, you may also drag the desired file from anywhere outside of Intellyze to this area and it would promptly open.
You may drag parts of the text from one place to another using the left-mouse button after highlighting the desired text.
It is possible to use the right-mouse button to display a menu to cut, copy or paste text. And in the case that there is highlighted text, the mouse will provide you with two more functionalities:
- either transmit highlighted text to the search field and begin searching for it, or
- perform partial analysis: analyzing letters and words of all of the highlighted text.
|
|
|
1)
|
|
The patented Intellark Keyboard layout is supported when typing in the text area; see the Intellark tab below
|
|
2)
|
|
You may type in English or whatever your system supports when disabling the Intellark layout from the Options pane below, or by pressing Control L to enable/disable Intellark
|
|
3)
|
|
You may use the mouse buttons to to drag highlighted text around, cut ( ), copy ( ) or paste ( ) text in the text area. Or in the case of the presence of highlighted text, you may also search ( ) for that text, or frequency-analyze( ) its letters and words.
|
Simply pressing the Calculate Frequency button (see Tool 1 above) fills the cells of the table shown in Figure 7.
|
Figure 7
|
|
|
|
1)
|
|
The general statistics table shows these five cells:
-
number of unique words
-
number of all the words
-
number of letters
-
number of lines
-
number of spaces
|
The Letter table displays each letter with its frequency as found in the text and the percentage of that frequency. See Figure 8.
|
Figure 8
|
|
|
|
1)
|
|
Columns are sorted in ascending or descending order when pressing on the column header as shown in Figure 8
|
|
2)
|
|
When right-clicking the mouse over a column header, the contents of selected columns are highlighted and are transferred to the clipboard for pasting purposes (simply issue the paste command or Control V for example) in other applications; this is highlighted in the same figure
|
|
3)
|
|
You may include the column header titles during the copying process; this is performed by explicitly choosing so from the Options pane described later (see Function 2.3 in Section 11 below)
|
|
4)
|
|
When hovering with the mouse on any letter, that letter is highlighted in the histogram pane for ease of visual comparison as shown in Figure 9
|
|
Figure 9
|
|
|
The word table displays each word, together with an identifying number and its frequency as encountered in the text. See Figure 10.
|
Figure 10 |
|
|
|
Translation of text shown in text-area of Figure 10
Ibn Battuta: 30 years of travel (extracted from Wikipedia pages on the Internet)
-------------------------------------------------------------------------------
::: The first paragraph is extracted from http://ar.wikipedia.org/wiki/ابن_بطوطة :::
More? Click on "بطوطة" in the words table to have the word transferred to the search field above and highlighted in the text area, then use Intellaren search tool (Control I) to search for the word over Intellaren's search page over the Internet.
|
|
|
1)
|
|
Columns are sorted in ascending or descending order when pressing on the column header as shown in Figure 8
|
|
2)
|
|
When right-clicking the mouse over a column header, the contents of selected columns are highlighted and are transferred to the clipboard for pasting purposes (simply issue the paste command or Control V for example) in other applications; this is also highlighted in Figure 8
|
|
3) |
|
You may include the column header titles during the copying process; this is performed by explicitly choosing so from the Options pane described later (see Function 2.3 in Section 11 below) |
|
4)
|
|
When clicking on a word in the words table, the following is performed:
-
it is transferred to the search field (Tool 13)
-
its frequency in the text is counted as a function of the six search parameters (Tools 6 to 11)
-
the number of occurrences is displayed in the designated field (Tool 14)
-
search for the word begins starting from current caret position in the text area, where matches are also highlighted
|
|
5)
|
|
Searching for words in the word table in the Internet is none but two clicks away:
-
click on the word in the table; this transfers the word to the search field (Tool 13)
-
click on the Internet search button or press Control I (Tool 17); this submits the word to the Intellaren search page. Figures 10 and 11 highlight the simplicity of this operation
|
|
Figure 11
|
|
|
The Matches table displays a list of words that branch out of a root word supplied in the search field. In Figure 12, an example shows the Matches table displaying a list of words that result when an advanced search is carried out on the root word يوم (yawm, or day in English) in the entire text of the Quran.
|
Figure 12 |
|
|
|
1) |
|
The forth column from the right is entitled تجاهل (tajahal, or ignore in English). When a cell is checked, the frequency of the corresponding word is subtracted from the total (جمع) shown in the last row, and the count of words is accordingly decreased by one |
|
2) |
|
The fifth column from the right is entitled إحذف (ihthif, or delete in English). When a cell is clicked on the button, the whole row is removed, and the cumulative statistics are updated accordingly |
|
3) |
|
Searching for the same root words again regenerates the original list |
|
4) |
|
Clicking on a word in words column locates the word in the text area; further clicks locate the next occurrence in a rotary fashion |
The lower tabs provide many functions such as displaying letter frequency histogram, setting some preferences, a short description of the Intellark keyboard layout, and information about Intellyze. Any tab may be displayed by simply clicking on its icon handle shown on the right hand side. Following is a description for the function of each tab which is provided in this manner: function number, tab name, identifying icon image, and a description of its contents.
|
Number
|
Tab
|
Icon
|
Description
|
|
1)
|
Histogram
|
|
Letters are displayed with their names and frequencies based on alphabetic ordering or on frequency
|
|
2)
|
Options
|
|
The following facilities are provided:
-
Enable or disable Intellark keyboard layout
-
Hide or show tooltips when hovering with the mouse over a tool
-
Include or exclude column header titles when copying table column contents to the clipboard
-
Resize the font size in the text area
-
A check-for-update utility to keep your Intellyze copy up-to-date
-
Intellark typing response tolerance. The values of the time elapsed between key presses range on the slider from very fast (سريع جدًا), which is 200 milliseconds, to relaxed (مرتاح), which is 400 milliseconds, where the value associated with each tick increases by 50 when going down. For example, if the slider is set at 200, then to type ش , ذ , ض or any of the characters that require two or more presses, the elapsed time between any two key presses must be 200 milliseconds or less.
|
|
3)
|
Intellark
|
|
A short description about Intellark is provided, together with a map of the Intellark keyboard layout. Links to Intellark's main page over the Internet and to tutorials are also provided
|
|
4)
|
Intellyze
|
|
This tab contains information about the Intellyze copy running on your machine, and contains links to offline and online documentations about Intellyze
|
|
1)
|
|
It is possible to stretch the Intellyze frame to occupying a bigger size, this is accomplished by moving about the boundaries identified with blue-colored ovals
|
|
2)
|
|
Intellyze sits on several split panes that are stretchable in the horizontal or vertical direction, this is accomplished by moving the boundaries identified by the cyan-colored ovals. It is also possible to hide a pane in favor of extending an adjacent one, this is accomplished by clicking the arrows on the boundaries identified by the orange-colored ovals
|
|
Figure 13
|
|
|
Copy the sura of Alfati-ha (Sura 1 in the Quran) to the text area of Intellyze and run frequency analysis on it now. Notice how Intellyze steps over numbers, symbols and diacritics, leaving you with the opportunity to focus on what needs to be analyzed, without needing prior text preprocessing activities from you. See http://www.intellaren.com/articles/en/qss to learn more about how Intellyze may be used to analyze the whole text of the Quran.
1|بِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ
2|الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ
3|الرَّحْمَٰنِ الرَّحِيمِ
4|مَالِكِ يَوْمِ الدِّينِ
5|إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ
6|اهْدِنَا الصِّرَاطَ الْمُسْتَقِيمَ
7|صِرَاطَ الَّذِينَ أَنْعَمْتَ عَلَيْهِمْ غَيْرِ الْمَغْضُوبِ عَلَيْهِمْ وَلَا الضَّالِّينَ
|