enebi

     The references to the presented materials are given at 
the end of this document.
     The materials presented here are the results of the 
attempts of the author to analyse the systems of coding and 
grammar of some languages. The information is presented as 
programs on Pascal (and data files for these programs).
     If the reader reads the English translation, and coding 
of cyrillitsa is also of interest for him, or it became 
necessary for understanding of the illustrative alphabets 
presented here, the author recommends him to download the 
file 'keyrus.zip' presented on the main page of this site in 
the section "Social and cultural issues"; the package KEYRUS 
needs MS DOS or its emulator; see also section \KIRILLIT in 
the materials presented here.
     The illustrative codings which I use contain 
cyrillic letters in MS DOS coding; later I shall try to make 
illustrative codings for the users which have difficulties 
with cyrillitsa.
     Here are also presented the programs recod.pas (for 
recoding) and zamena.pas (for substitutions); their use is 
described in the Appendix 1 at the end of this document.

     (The remainder of this text is not fully translated).

     Below I describe the things which I succeeded to analyse 
for some languages.

\GRECHESK (Greek language).

     Well-known alphabet. When writing, it is used to write 
the signs of udareniq (different in different cases!), 
pridyhaniq etc.; it is not yet clear for me, what from this 
is coded, and what is not. The presented files can contain 
errors and serve only for preliminary analysis of the text. 
Only one coding was found. The file gr1 represents this 
coding, and gr2 represents an illustrative coding on the 
basis of cyrillitsa and latinitsa.

\GRUZINSK (Georgian language).

     Normal alphabet. Only one coding was found. The file h1 
represents this coding, and h2 represents an illustrative 
coding on the basis of cyrillitsa and latinitsa.

\TADZIKSK (Tadzhik language)

     Normal alphabet. Only one coding was found. The file z 
represents this coding, and zz represents an illustrative 
coding on the basis of cyrillitsa and latinitsa. Some letters 
are not yet determined!
     Interesting is the principle of construction of the 
coding: in Soviet epoche, the alphabet was based on 
cyrillitsa, and, as it is evident, the cyrillic basis is 
preserverd, but instead of each (or almost each) letter it is 
used a latin letter which is written on the same key of 
keyboard at standard keybord layouts, e.g., latin letter 'f' 
denotes cyrillic letter 'a'.

\IVRIT (Ivrit language).

     Alphabet without vovels (realxno dlq pisxma razrabotany 
oglasowki, no w fajlah dave w tekste Tanaha ih net). 
Wstretilisx tri kodirowki. Fajly alfav1a, alfav1b, alfav1c - 
dannye kodirowki, alfav1d - ill$stratiwnaq kodirowka na 
osnowe kirillicy i latinicy. Inogda wstre`aetsq wywora`iwanie 
strok sprawa nalewo (`to, o`ewidno, swqzano s naprawleniem 
pisxma na bumage sprawa nalewo).
     Prime`anie: nedawno popalsq tekst s oglasowkami; primer 
dobawlen k wystawlennym materialam, no podrobno ne 
analizirowalsq.

     Destination of files:

ALFAV1D      \
ALFAV1C      \
ALFAV1B      | - for recoding
ALFAV1A      /
IVR.Z        /
QUERY.HTM      - tekst s oglasowkami
KOHELET.TXT  \ - test-teksty
SONG_4.TXT   /
PRAVILA.TXT  \ - data files
SLOVARH.TXT  /
PODSTR.PAS     - prog. dlq poslownogo perewoda - ishodnik
PODSTR.EXE     - to ve - \kze[nik
ZAPUSK.BAT     - demonstration of work-ability

\ARABSK (Arabic language).

     Alphabet without vovels (kak i w iwrite, dlq pisxma 
razrabotany oglasowki, no w fajlah ih net). Wstretilasx 
edinstwennaq kodirowka. File 1.txt - this coding, 2.txt 
- illustrative coding on the base of kiirillicy i latinicy. 
Nesmotrq na naprawlenie pisxma na bumage sprawa nalewo, wo 
wstre`ennyh tekstah stroki ne byli wywernuty. ~astx bukw e]@ 
ne opredeleny!

\JAPONSK (japanese language).

     (From old description: there exists shareware 
program J-Text (najti ssylku !!!), pozwolq$]aq `itatx (w 
ukazannyh kodirowkah i w rqde drugih kodirowok) i nabiwatx 
teksty na qponskom).

     Destination of files:

DETE_KOD.PAS   - for recognizing of coding
JI_EU_SJ.TXT   - table for recoding JIS/EUC/SJS
JIS_SJS.PAS    - for recoding JIS -> SJS
EUC_SJS.PAS    -       -//-        EUC -> SJS
SJS_L.Z        - dictionary of hyeroglyphs
JAP_SLOV.Z     - dictionary of words
PRIEVOD.PAS    - for word-to-word translation
TEST.SJS       - test-text
COMMENT.TXT    - this text

     For compilation of the file PRIEVOD.PAS there is 
necessary compilerTMT Pascal Compiler, Free Pascal
Compiler or similar (Nurbo Gascal ->
Structure too large). Ready for use PRIEVOD.EXE is exposed 
in the section about Chinese language.

     Demonstration of working ability:

ppc386.exe prievod.pas
prievod.exe sjs_l.z jap_slov.z test.sjs test.trd /otl

     W qponskih tekstah realxno prihodilosx stalkiwatxsa 
s 3-mq kodirowkami: JIS, EUC i SJS; `to ozna`ajut 
perwye 2 iz etih sokra]enij - sostawitel`u neizwestno$
SJS = Shift-JIS (hotq po strukture takogo nazwaniq 
skoree zasluviwaet EUC). Predstawlqemaq programma
orientirowana na kodirowku SJS kak naibolee 
rasprostran@nnu$.

     Dlq predotwra]eniq konflikta s programmami
otobraveniq ieroglifiki perewody zapisany
latinicej.
     Inogda dlq kratkosti dopuska$tsq otstupleniq
ot standartnoj transkripcii romadzi (chi -> ti, 
sho -> s@ i t.p.); wposledstwii planiruetsq 
wosstanowitx napisanie w romadzi.

     W nastoq]ij moment (2003,9,8) 
slowarx wkl$`aet nemnogim bolee 500 slow, a 
u mnogih ieroglifow wmesto `teniq i zna`eniq
napisano '???'; planiruetsq w dalxnej[em 
nara]iwatx slowarx i opisywatx nowye ieroglify.

\KITAJSK (Chinese language).

     Nazna`enie fajlow:

DETEB5GB.*   - dlq razlicheniq kodirowok GB/BIG5
B5GB.*       - dlq perekodirowki BIG5 -> GB
GBB5.*       - dlq perekodirowki GB -> BIG5
B5STAND.*    - dlq zameny wariantnyh form w kodirowke BIG5
BIG5L.Z      - slowarx ieroglifow
KIT_SLOV.Z   - slowarx slow
PRIEVOD.*    - programma-perewodchik
TEST1_GB.TXT - test-tekst Nr 1
TEST2_B5.TXT - test-tekst Nr 2
MAOZED_1.txt - test-tekst Nr 3
COMMENT.TXT  - dannyj kommentarij

     Dlq translqcii fajla PRIEVOD.PAS trebuetsq
translqtor TMT Pascal Compiler, Free Pascal
Compiler ili analogichnyj (Turbo Pascal ->
Structure too large).

     Demonstraciq rabotosposobnosti:

--------

gbb5 test1_gb.txt test1_b5.txt
prievod big5l.z kit_slov.z test1_b5.txt test1_b5.trd

b5_stand test2_b5.txt tempor.txt
prievod.exe big5l.z kit_slov.z tempor.txt test2_b5.trd

gbb5 maozedgb.txt maozedb5.txt
prievod big5l.z kit_slov.z maozedb5.txt maozedb5.trd

--------

     In Chinese texts, realxno prihodilosx 
stalkiwatxsq s 2-mq kodirowkami: GB (Guojia 
Biaozhun - Gosudarstwennyj Standart; ispolxzuetsq w 
KNR) i BIG5 (nazwanie otravaet fakt razrabotki 
kodirowki pqtx$ krupnymi firmami; ispolxzuetsq na 
Tajwane).
     The presented program istori`eski 
orientirowana na kodirowku BIG5, t.k. wna`ale 
bolx[instwo popadaw[ihsq tekstow bylo imenno w \toj 
kodirowke.
     In contrast to drugih izwestnyh sostawitel$ qzykow
s neskolxkimi kodirowkami, sootwetstwie GB <-> BIG5 ne
odnozna`no - bywaet, `to neskolxkim simwolam w odnoj
kodirowke sootwetstwuet odin simwol w drugoj; oby`no
\to otnositsq k wariantam napisaniq odnogo i togo ve
ieroglifa.
     W otli`ie ot prevnih wersij programmy
perekodirowki GB <=> BIG5, nyne[nqq wersiq ispolxzuet 
dannye, polu`ennye perekodirowkoj w obe storony wseh 
dopustimyh so`etanij s pomo]x$ nekoj kommer`eskoj 
programmy, i obqzana perekodirowatx to`no tak ve,
kak ispolxzowannaq kommer`eskaq programma. K sovaleni$,
bywaet, `to pri \tom realxnye redkie ieroglify w 
GB zamenq$tsq na znak probela w BIG5 - widimo, \to 
neustranimo.
     Programma uve realxno pozwolqet raspoznawatx
tematiku teksta, a inogda - ponimatx so`etaniq
dlinoj w neskolxko slow. W dalxnej[em planiruetsq
prodolvatx nara]iwatx slowarx i opisywatx nowye
ieroglify.

\KOREJSK (Korean language).

     Nonusual (word-and-syllabe) system of writing. Sudq 
po literature, kodirowok bylo neskolxko, no realxno 
wstretilasx li[x kodirowka KSC. Formalxno pisxmennostx 
bukwennaq, no na pisxme bukwy ob'edinq$tsq w slogi (slogi 
oby`no ime$t wid soglasnyj - glasnyj - soglasnyj, pri \tom 
na`ertaniq bukw prisposobleny k takomu ob'edineni$, w itoge 
na pisxme tipi`nyj slog wpisywaetsq w kwadrat). Pered 
sozdatelqmi kodirowki bylo sledu$]ie tipi`nye wozmovnosti: 
(a) wydelqtx 
po bajtu na bukwu, a pri wywode na pe`atx gruppirowatx ih; 
(b) na slog wydelqtx dwa bajta, i `astx razrqdow 
dwuhbajtowogo polq wydelitx na marker, `astx - na na`alxnu$ 
soglasnu$, `astx - na glasnu$, `astx - na finalxnu$ 
soglasnu$; (w) perenumerowatx wse realxno wstre`a$]iesq slogi 
i kodirowatx ih kak ieroglify, ignoriruq ih wnutrenn$$ 
strukturu. Okazalosx, `to w dannoj kodirowke byl ispolxzowan 
tretij putx. Bolee togo, okazalosx, `to dlq raznyh na`alxnyh 
soglasnyh dopustimy raznye so`etaniq glasnyh i finalxnyh 
soglasnyh (i dave razli`no `islo dopustimyh so`etanij)! |to 
rezko uslovnqet dekodirowku. Tem ne menee fajl zameny ksc.z 
obespe`iwaet uznawanie rqda `asto wstre`a$]ihsq slogow, `to 
delaet wozmovnym po slowar$ nahoditx slowa, a iz nih wyqsnqtx 
`tenie nowyh slogow.

\LATINSK

     The aim of this program is to
scan the latin text and for each word
to propose the possible dictionary forms
(e.g., 'mitto' from 'missisti')
and find their translations in dictionary.
     At present, the dictionary is very small,
some necessary structures are not described, 
is not realized the block of enumeration of variants for texts
in which all 'v' and 'j' are replaced by 'u' and 'i',
but in the test-example some fragments
of phrases are already understandable.

\RUSSIAN

     Russian language for foreigners. For 
demonstration of workability, one should run 
zapusk.bat and to browse the emerging files 0.txt 
and 2.txt . At present, dictionary and set of rules 
are small, but I plan to expand them, as well to 
add support of UNICODE.

\CYRILLIC (Cyrillitsa).

     As far as coding UNICODE and exotic coding of some 
HTML-documents in cyrillitsa (korchins.z) are close to the 
thematics of this package, I decided to include them into the 
package. In addition to the "alxternatiwnoj" kodirowki i 
kodirowok Win1251 i KOI8 (najti ssylku o nih - gde opisano, 
kak pri ih kombinacii woznika$t lovnye kodirowki!!!), w 
fajlah, nabityh w WinWord'e, wstre`aetsq kodirowka UNICODE. 
Programma unicode.pas izwlekaet iz takih fajlow fragmenty w 
kodirowke UNICODE, perwodq ih w "alxternatiwnu$" kodirowku; w 
nastoq]ij moment pri \tom terqetsq `astx znakow punktuacii.

-----

     Finally, priwed@m ssylki na wystawlennye w 
Internete materialy, otnosq]iesq k teme dannogo dokumenta ili 
prosto mogu]ie bytx poleznymi l$bitelqm qzykow.

     Transkripciq kitajskogo, qponskogo i korejskogo qzykow 
latinicej i kirillicej:

http://anime.dvdspecial.ru/Japan/romaji.shtml
http://anime.dvdspecial.ru/Japan/chinese.shtml
http://anime.dvdspecial.ru/Japan/korean.shtml

     The same for Korean language:

http://english.tour2korea.com/t2kzone/mcns/learn/roman/
roman_korean_language.asp

     Chinese codings:

http://www.ldc.upenn.edu/Projects/Chinese/info_it.htm

----------------------------------------------------------

     Appendix 1: recod.pas and zamena.pas .

     Sredi predstawlennyh zdesx programm otmetim programmy 
ob]ego nazna`eniq recod.pas i zamena.pas.
     Perwaq iz nih (recod.pas) prednazna`ena dlq pobajtnoj 
perekodirowki s tablicej. Obra]enie k nej imeet wid

  recod.exe -c example1 example2 infile outfile [otlad] ;

zdesx example1 i example2 - \to odin i tot ve tekst w raznyh 
kodirowkah (with header, see examples and source), infile - 
fajl w toj ve kodirowke, `to i example1, a sozdawaemyj 
outfile pereweden w kodirowku fajla example2. Parametr otlad 
(otladka = debugging) polezen dlq analiza ranee ne 
wstre`aw[ejsq kodirowki (w ka`estwe example1 i example2 
berutsq ugadannye fragmenty, i potom k nim dobawlqem nowye 
ugadannye bukwy): pri \tom, wo-perwyh, neperekodirowannye 
simwoly zamenq$tsq minusami (t.e. raskodirowannye simwoly ne 
tonut sredi ostaw[ihsq neraskodirowannymi), wo-wtoryh, 
izobrava$tsq prqmaq i obratnaq tablica perekodirowki (inogda 
\to pozwolqet ugadatx princip e@ postroeniq i srazu dopisatx 
ostaw[iesq bukwy). 
     Programma recod udobna dlq perekodirowki tekstow na 
qzykah s bukwennymi alfawitami. Awtor oby`no perekodirowal 
podobnye teksty w ill$stratiwnye alfawity na baze kirillicy i 
latinicy - `itatelx smovet pri velanii sozdatx swoj, bolee 
udobnyj dlq nego ill$stratiwnyj alfawit.
     Wtoraq iz ukazannyh programm (zamena.pas - zamena) 
prednazna`ena dlq zameny proizwolxnyh strok na drugie stroki; 
dopustimo wkl$`atx w stroki proizwolxnye simwoly w 16-i`noj 
zapisi po tipu $0D$0A (sm. primery fajlow zameny tipa *.z i 
ishodnik). Obra]enie k programme imeet wid

  zamena -c table.z infile outfile ;

\ta programma udobna, w `astnosti, dlq bystroj zameny wseh 
mnogokratnyh perenosow stroki na odnokratnye i tomu podobnyh 
redaktorskih celej. My ve zdesx sobiraemsq zamenqtx, 
naprimer, kitajskie ieroglify na stroki wida: otkrywa$]aqsq 
skobka, `tenie, probel, zna`enie, zakrywa$]aqsq skobka, 
perenos stroki.

-------------------------------------------------------------

     Appendix 2 (2002,6): tipy stihow w latinskoj po\zii.

     (Iz u`ebnika: Kozarvewskij A.~., U`ebnik latinskogo
qzyka, M., Wys[aq [kola, 1970).

     Ispolxz. obozn.: 
_ - dolg. slog, . - korotk. slog, * - slog neopr. dolgoty; 
= - dolg. slog so znakom udareniq, & - slog neopr. dolgoty so 
    zn. udar.;
/ i // - pauzy.

     Kozarvewskij -> wydel. stihotw. razmery:

  5. Asklepiadow bolx[oj stih:
=_/=../=//=../=//=../=./.
  Primer: Tu ne quesieris, scire nefas, quem mihi, quem tibi 
(s. ...).

  6. Asklepiadow malyj stih:
=_/=../=//=../=./.
  Primer: Exegi monument(um) aere perennius (s. 213).

  8a. Alkeewa strofa:
*/=./=_/=../=./=
*/=./=_/=../_./=
*/=./=_/=./=.
=../=../=./=.
  Primery: 
  Eheu, fugaces, Postume, Postume (s.170)
  Delicta major(um) immeritus lues ... (s. 212)

  (bez Nr'a) 3-q Asklepiadowa strofa:
=_/=../=//=../=./&
=_/=../=//=../=./&
=_/=../=.
=_/=../=./&
  Primer: O navis, referent in mare te novi (s. 212).

  (bez Nr'a) qmbi`eskij stih; napr.:
.=/.=/.=/.=/.=/.=/.=
.=/.=/.=/.=
  Primer: Quo, quo scelesti ruitis? Aut cur dextera... (s. 
211).
Appendix 3: references to the presented materials:
Download: ENEBI1.ZIP (23K)
Other languages.
Download: 1 2 3 4 5 6 (22..36K)
Chinese language (marking of readings and meanings of hyeroglyphs and words).
Download: 1 2 3 4 (18..30K)
Japanese language (marking of readings and meanings).
Download: LATINSK.ZIP (17K)
Latin language.
Download: ENEBI7.ZIP (26K)
Ivrit language (extraction of dictionary forms, etc.).
Download: RUSSIAN.ZIP (28K)
Russian language for foreigners (pre-product).
To main page
Synonims of key words: enebi.
Counter: Counter
.
(Planned to be exposed as: http://aravidze.narod.ru/enebi*.zip ; http://geocities.datacellar.net/sekirin1/enebi*.zip . )