BWSORT can be run from the shell prompt as
bwsort [-s <sortscheme>] file1 [file2 [file3 ...]]where the sorting scheme is either DEFAULT or OLD. There are small differences between these two schemes. While the default scheme looks more appropriate to me, many Bengali dictionaries tend to use the old scheme. If the environment variable BWSORTSTYLE is set (to DEFAULT or OLD), that value is used as the default sorting style. Otherwise, the default scheme is DEFAULT. In both cases, however, this default behavior can be overridden by the -s option.
bwsort -hprints a help message and quits, whereas
bwsort -vprints the version info and quits.
When one runs BWSORT without any file names in the command line, the program starts running in interactive mode. The prompt that is displayed is
bwsort>In the interactive mode, the following commands are interpreted:
add file1 [file2 [file3 ...]] comp word1 word2 exit / quit help [topic] load file1 [file2 [file3 ...]] save file show sortstyle [style] version !shell commandType
helpat the BWSORT prompt to get an on-line help on the commands supported.
Vowels a, ae (= a + jafala + aa-kaar), aa, i (hrashwa-i), I (dirgha-i), u (hraswa-u), U (dirgha-u), Ri, e, ea (= e + jafala + aa-kaar), E (oi), o, O (ou). Vowel forms Similar to the vowels Consonants k, K (=kh), g, G (=gh), ^n (=una), c (=ch), C (=chh), j (=j), J (=jh), ^N (=ina), T, Z (=Th), D, X (=Dh), N, t, z (=th), d, x (=dh), n, p, f (=ph), b, v (=bh), m, Y (=antashtha-ja), r, l, b, S (=sh = talabya-sha), S (=murdhanya-sa), s (=dantya-sa), h, rr (=Da-e shunya ra), rh (=Dha-e shunya ra), y (=antashtya-a), ^t (khanda-ta), M (=anuswar), H (=bisarga), ^ (=chandrabindu). Conjunct consonants Some allowed combination of two or more consonants Digits 0 1 2 3 4 5 6 7 8 9 Punctuation symbols period (=dnari), comma, quote, space etc.However the basic primitives are: The vowels and the pure consonants (i.e. consonants without any vowel sound, e.g., ka-e hasanta, etc.) plus the punctuation symbols and digits. Any Bengali string can be broken as a concatenation of these primitives. For example,
prakhara daaruNa ati dirgha dagdha din.can be broken as
p_ + r_ + a + kh_ + a + r_ + a + space + d_ + aa + r_ + u + N_ + a + space + a + t_ + i + space + d_ + I + r_ + gh_ + a + space + d_ + a + g_ + dh_ + a + space + d_ + i + n_ + a + .Here the underscore (_) stands for the pure consonant forms (i.e. consonants without vowel sounds, or with hasanta). Any Bengali sorting scheme (be it a computer program or a press standard) sorts Bengali strings based on this decomposition. As regards the positions of these primitives in the Bengali alphabet, we have the following ordering:
a < ae < aa < i < I < u < U < Ri < e < ea < oi < o < ou < k_ < kh_ < g_ < gh_ < ^n_ < ch_ < chh_ < j_ < jh_ < ^N_ < T_ < Th_ < D_ < Dh_ < N_ < t_ < th_ < d_ < dh_ < n_ < p_ < ph_ < b_ < bh_ < m_ < Y_ < r_ < l_ < sh_ < ss_ < s_ < h_ < rr_ < rh_ < y_ < ^t < M_ < H_ < ^_The DEFAULT sorting scheme of BWSORT respects this order.
Note that there are a total of 52 alphabetic primitives. These have been given the ASCII values A - Z, a - Z in that order. Punctuation symbols and digits are given the same ASCII values as in roman. This makes an ordering of all finite length Bengali strings. BWSORT sorts Bengali strings based on this converted decomposition (using `strcmp').
While this scheme seems quite reasonable, many modern dictionaries in Bengali follow a slight variation of the primitive order. This mostly conforms with old Sanskrit conventions. The OLD sorting scheme of BWSORT is based on these conventions. We will now enumerate the differences between DEFAULT and OLD schemes:
rr_ <--> D_, rh_ <--> Dh_, y_ <--> Y_, ^t <--> t_In the dictionary order rr_, rh_ and y_ immediately follow D_, Dh_ and Y_ respectively, though they are not in those positions in the alphabet. See point 4 below for a discussion on ^t.
b_ + a + d_ + aThis is not grammatically correct, but this convention is followed in Bengali dictionaries. BWSORT's OLD scheme respects this convention. The DEFAULT one, on the other hand, does not put the a after the hasanta (b_) and thereby identifies b_da as the conjunct bda (ba-e da-e).
The primitive ordering for the OLD scheme is, therefore, like the following:
a < ae < aa < i < I < u < U < Ri < e < ea < oi < o < ou < M_ < H_ < ^_ < k_ < kh_ < g_ < gh_ < ^n_ < ch_ < chh_ < j_ < jh_ < ^N_ < T_ < Th_ < D_ < rr_ < Dh_ < rh_ < N_ < t_ = ^t < th_ < d_ < dh_ < n_ < p_ < ph_ < b_ < bh_ < m_ < Y_ < y_ < r_ < l_ < sh_ < ss_ < s_ < h_
These make the OLD sorting scheme a little bit different from the DEFAULT scheme. As we have discussed elsewhere, bwsort allows you to choose the one you like in a variety of ways (-s option in command line, setting the environment variable BWSORTSTYLE, calling sortstyle in the interactive mode).
Before we end, some general remarks about a few BWSORT conventions are in order:
jafala + aa-kaarWhen this sequence comes immediately after a consonant (as in baekaraN, for example), the decomposition goes like this
baekaraN = b_ + Y_ + aa + k_ + a + r_ + a + N_ + aOn the other hand, when jafala + aakaar comes after the vowels `a' or `e', they are not decomposed the same way, that is, not as
a + Y_ + aa or e + Y_ + aaInstead it is preferable to treat `ae' and `ea' as separate vowels which do not have any vowel forms (kaar) associated with them. This convention is followed for both the DEFALUT and the OLD sorting schemes.
That's all! If you find some conventions wrong or wrongly implemented, or there is a pre-defined standard which every sorting scheme should follow, please let me know. I can be reached at
abhij@csa.iisc.ernet.inThanks for your interest in bwsort.
Abhijit Das (Barda)
BWSORT is a freeware. Permission is hereby granted to use and distribute it free of charge for all sorts of personal and academic purposes. Use of this software for commercial purposes is strictly prohibited.