Doc Type: Exploratory proposal
Title: Proposal to include Chinook script in UCS
Source: Van Anderson
Status: preliminary/exploratory for public review, under periodic revision
Replaces: Codechart v. 3.2
Action: For initial review by community.
Date: 2009-02-19


Overview

The Chinook script, an adaptation of the Duployéan shorthand by father Jean Marie Raphael LeJeune, is an historic script used for writing the Chinook Jargon and other languages of interior British Columbia. Its original use and greatest surviving attestation is from the run of the Kamloops Wawa, a weekly newsletter of the Catholic diocese of Kamloops, British Columbia, published 1891-1923. At the time, the Chinook Jargon trade language was spoken in an area encompassing SE Alaska, most of British Columbia, Washington State, western Montana, Oregon, Idaho, and far northern California, and although the Chinook Jargon was the lingua franca in many communities, it was generally a spoken, rather than written language. Most attempts at documentation used the Latin script to approximate Jargon phonology, and indeed, dictionaries of the Chinook Jargon are still readily available in these Latinate orthographies. In contrast, the archives of the Kamloops Wawa, written in the Chinook script, includes a considerable dictionary, but also constitutes a 3+ decade corpus of Chinook Jargon usage, during the height of its spread and utility. There currently exists no formal encoding, in any context, for the representation of the Chinook script, and the only informal representation is transliteration by means of the Latin orthographies used in writing the Chinook Jargon. Indeed, the submission of the Chinook script to UCS has necessitated the creation, from scratch, of the first Chinook typeface, such an effort currently underway with glyph images available for review.

Structure

The Chinook script contains several classes of letters, differentiated by visual form - hence script function - and phonetic value. Letter classes include the line and arc consonants, circle vowels (A and the O/W vowels), nasal vowels, arc vowels, H and X. Since the Chinook script is an adaptation of a shorthand system, strings of letters are intended to join together cursively to form nominally syllabic units. This syllabic joining is generally algorithmic, but alternate syllable formation is quite commonly inherited from source languages and requires manual encoding. Most letters have variant forms, including the addition of ancillary dots, compounding of vowels, and overlapping combining behaviors for initialisms and abbreviations. Excepting the stroke directions of letters, the Chinook script is written syllable by syllable, LTR, in horizontal lines proceeding down the page, as with most European scripts.

Ordering

Ordering of the characters in the Chinook script is undefined - the only lexicon using the script cites in Latin alphabetical order - so allocation order in the Chinook Character Block is revisable up to inclusion in the standard. Essentially, a Unicode Standard that includes a Chinook Character Block will be the only official ordering of the script. The currently proposed allocation ordering, and its reasoning is as follows: According to Father LeJeune's Chinook Rudiments, the characters encoded x00-x09 (,,,,,,,,, and ) double as the numbers 1-9&0. x09 & x0A () constitute the next basic vowels given in his introduction. x0B () is another simple vowel with a related variant form 16 code points later. x0C & x0D ( & ) round out the basic vowels given in LeJeune's repertoire, while x0E () is the last basic vowel in the Chinook Script. x0F () is the last simple consonant, the voiced consonants being elongated forms of their unvoiced counterparts, and has a graphically similar combining form opposite in the next column. The second column of the allocation begins (x10-x14: ,,,, and ) with the voiced counterparts (elongated form) of the first five consonants (x00-x04). Next come the Nasal Vowels, x15-x18 (,,, &) that only intermittently appear in the Wawa texts, but are definitely not composed characters or character variants. x19 and x1A are reserved for any new characters found in the process of scouring any private texts written in the Chinook Pipa. x1B is the related sister character to x0B. These two characters often have differing orientations, being just turned or mirrored versions of each other, and distinct conjoining properties, but are identical when adjacent two consonants. U+x1C is a fully spacing letter, like a non-combining H, and was used in representing a velar fricative in many of the Salishan languages of interior BC. Finally, x1F is the Combining Chinook Middle Dot, used to indicate another Salish letter (by modifying U) or modified arc consonants.

Alphabetization

No information is available on alphabetization, as the dictionary portions of the Chinook Rudiments text are given in roughly Latin alphabetical order. Other sources groups words by novel alphabetization, no more or less canonical than any other. The most logical ordering given the structure of the script would be along the lines of P, B, T, (Th) D, (Dh) F, V, K, (Kh) G, L, (Lh, hL) R, (Rh) M, N, (Ng) Sh, (Ch), S, (Ts), O, (W+O vowels), (O+ vowels) A, Wa, I, (E), Wi, (We), (Wi+vowels) Oo, (W+Oo vowels), (Oo+ vowels), Ow, (W+Ow vowels) (Ow+ vowels), U, (Uh), H, then An, In, On, and Un - the final four never being word or syllable initial. Given that alphabetization is not a defined property of the Unicode Standard, it would seem that the above and simple binary order would more than suffice for any implementation needing an order of alphabetization.


Chinook Codechart v.3.3


Previous Version
Normative
Glyph Shapes
Example
Glyph Images
 U+x00  U+x10 
0
1
2
3
4
5
6
7
8
9
 A 
B
C
D
E
F
CharacterIsolatedInitialMedialFinal
U+x09
U+x0A
U+x0C
U+x0D


Short Line Consonants
00CHINOOK LETTER P
· number 1
→ 2575 box drawings light up
→ 2577 box drawings light down
01CHINOOK LETTER T
· number 2
→ 2574 box drawings light left
→ 2576 box drawings light right
02CHINOOK LETTER F
· number 3
03CHINOOK LETTER K
· number 4
• written down and to the left
04CHINOOK LETTER L
· number 5
• written up and to the right

Arc Consonants
05CHINOOK LETTER M
· number 6
→ 0028 ( left parenthesis
06CHINOOK LETTER N
· number 7
→ 0029 ) right parenthesis
07CHINOOK LETTER SH
· number 8
→ 23DC top parenthesis
08CHINOOK LETTER S
· number 9
→ 23DD bottom parenthesis

Simple Vowels
09CHINOOK LETTER O
· number 0
· Compound Vowel Base
· Compounding Vowel
· Circle vowel
→ 20DD combining enclosing circle
→ 25CB white circle
→ 25EF large circle
→ 3007 ideographic number zero
0ACHINOOK LETTER A
· Compounding Vowel
· Circle vowel
0BCHINOOK LETTER I
· Conjoining, non-compounding vowel
x1B chinook letter E
0CCHINOOK LETTER OO
· Compounding Vowel
· Circle vowel
0DCHINOOK LETTER OW
· Compound Vowel Base
· Compounding Vowel
· Circle vowel
→ 2299 circled dot operator
→ 0298 latin letter bilabial click
→ 2609 sun
→ 2A00 n-ary circled dot operator
0ECHINOOK LETTER U
Dot Consonant
0FCHINOOK LETTER H
→ 00B7 middle dot
→ 0387 greek ano teleia
→ 2022 bullet
→ 2024 one dot leader
→ 2027 hyphenation point
→ 2219 bullet operator
→ 22C5 dot operator
→ 30FB katakana middle dot

Long Line Consonants
10CHINOOK LETTER B
→ 01C0 latin letter dental click
→ 007C vertical line
→ 2223 divides
→ 2502 box drawings light vertical
11CHINOOK LETTER D
→ 2014 em dash
→ 2015 horizontal bar
→ 2500 box drawings light horizontal
12CHINOOK LETTER V
→ 005C reverse solidus
13CHINOOK LETTER G
• written down and to the left
→ 002F solidus
14CHINOOK LETTER R
• written up and to the right

Supplementary Vowels
15CHINOOK LETTER AN
16CHINOOK LETTER IN
17CHINOOK LETTER ON
18CHINOOK LETTER UN
19<reserved>
1A<reserved>
1BCHINOOK LETTER E
· Compounding Vowel
x0B chinook letter I

Additions for Salish
1CCHINOOK LETTER X
· Voiceless velar/uvular fricative
1D<reserved>
1E<reserved>

Modifying Dot
1FCOMBINING CHINOOK MODIFYING DOT
· modifies N → Ng, Sh → Ch, S → Ts
· Abbreviated CCMD


Vowel Orientation

Chinook letters generally combine in syllabic groups according to a fixed algorithm. All consonants have a stroke direction - for P/B, F/V, K/G, M/N, and all variants, the stroke direction is top-down; for T/D, L/R, Sh/S, and variants, stroke direction is left to right. Consonants combine with the stroke termination of the first consonant marking the beginning of the second consonant's stroke. Consonants (including I before a vowel) combine into circular vowels and circular vowels into consonants at tangent angles - in the original source materials, the circles are actually continuations of the consonant strokes moving into and out of the circular vowel form. Vowels often combine beneath and to the right of consonants, but generally above for the pattern T/D/L/R preceding a circle vowel, then S/Sh/N/P/B/K/G; or the pattern L/R + circle vowel + T/D. Circle vowels usually combine inside arc consonants, as well. I and U almost exclusively follow the "in from the top or left, out down or right" rule, except that E orients exactly opposite unless joining with two adjacent letters. These rules having been given, the down/right rule will always be intelligible, though less elegant than contextual implementations.


Use of Zero Width Joiner, Zero Width Non Joiner, and the Conjoining Selector


Joiners and Variation SelectorsNone ZWJ *1 ZWNJ *2 ?CS? *3
Line & Arc Consonants + Standard Rendering Non-Breaking
(co-syllabic)
Breaking *4
(syllable break)
Overlapping
(initialism)
O (U+x09) + Standard Rendering Non-Breaking
(joining or co-syllabic)
Breaking *4
(syllable break)
X
H (U+x0F)+ Standard Rendering Joining
(leading modifier dot)
Breaking
(syllable break)
X
Nasal Vowels + Standard Rendering Non-Breaking
(displaced)
In-line rendering
(syllable break)
X
Other + Standard Rendering Non-Breaking
(joining or co-syllabic)
Breaking
(syllable break)
X
Joiner and Variation Selector Notes

*1 ZWJ codes for a single non-breaking, non-spacing connection that would otherwise not exist algorithmically.

*2 ZWNJ codes for a syllable break in a non-algorithmic location. Preceding or following letter clusters should combine as normal, i.e. legal clusters should combine with their syllable-forming vowels.

*3 The Conjoining Selector is currently undefined. This may take the form of a currently defined general character whose function could be logically expanded to this use, a control character within the Chinook character block, or a new control character or characters for general use in the BMP or Plane 14.

*4 The letters x00-x09 separated by ZWNJ would encode independent forms, ie the digits 1-9 & 0.

Use of ZERO WIDTH NON-JOINER (U+200C) and ZERO WIDTH JOINER (U+200D) in the Chinook script calls for a minor expansion of their described behaviour within the Unicode Standard. In The Unicode Standard, Version 5.0, ZWNJ and ZWJ inhibit or request combining (ligated or cursive) forms of the surrounding characters when those forms would not normally manifest in a given character sequence. The use described herein for the Chinook script would extend this usage to inhibit or request combining behaviour, in addition to combining forms, which do not exist in Chinook writing. Unlike their usage in composing Arabic initial, medial, and terminal forms, their effect on an isolated Chinook letter, as on an isolate Latin character, would be null.



Character Sequences: Compound Vowels
ContextCharacter SequenceValueForms
O + Vowel
compounds
O (U+x09) + A (U+x0A)Wa:
O (U+x09) + E (U+x1B)Wi:
O (U+x09) + O (U+x09)Wo:
O (U+x09) + Oo (U+x0C)Woo:
O (U+x09) + Ow (U+x0D)Wow:
O (U+x09) + O (U+x09) + A (U+x0A)OhWa:
O (U+x09) + E (U+x1B) + E (U+x1B)Weyi:
O (U+x09) + E (U+x1B) + A (U+x0A)Wia (Weeya):
Ow + A compoundOw (U+x0D) + A (U+x0A)OwAh:
Vowel-I compoundsO (U+x09) + I (U+x0B)Oi:
A (U+x0A) + I (U+x0B)Ai:
A (U+x0A) + I (U+x0B)Ya:
I (U+x0B) + I (U+x0B)Ye:
I (U+x0B) + E (U+x1B)
Character Sequences: Overlapping Consonants
Overlapping Arc + Line ConsonantS (U+x08) + ?CS? + T (U+x01)S.T. (Sahali Tyee)
Sh (U+x07) + ?CS? + K (U+x03)J.K. (Jesu Kri)
Overlapping Line + Arc ConsonantI (U+x0B) + T (U+x01) + ?CS? + S (U+x08)Etc.
Character Sequences: Nasal Vowels
Line Consonant + Displaced Nasal Vowel + Line ConsonantL (U+x04) + In (U+x16) + T (U+x0B)Lent:
Character Sequences: H modified line consonants; spacing w/ ZWNJ
ContextCharacter SequenceValueForms
Line Consonant + Modifying HT (U+x01) + H (U+x0F)Th:
K (U+x03) + H (U+x0F)Kh:
L (U+x04) + H (U+x1F)Lh (Ł):
D (U+x11) + H (U+x0F) + I (U+x0B)The (Ðe):
Modifying H + LH (U+x0F) + L (U+x04) hL:
R + 2 x Primary DotR (U+x14) + H (U+x0F) + H (U+x0F)Rh:
Unjoined H & ConsonantT (U+x01) + ZWNJ (U+200C) + H (U+x0F)T-H:
K (U+x03) + ZWNJ (U+x200C) + H (U+x0F)K-H:
L (U+x04) + ZWNJ (U+x200C) + H (U+x0F)L-H:
H (U+x0F) + ZWNJ (U+x200C) + L (U+x04)H-L:
Character Sequences: Contextual variation of I & E
ContextCharacter SequenceValueForms
Lone II (U+x0B)I:
Lone EE (U+x1B)E:
I + HI (U+x0B) + H (U+x0F) + T (U+x01)Iht:
E + H / H + EE (U+x1B) + H (U+x0F) + E (U+x1B)Ehe:
Cons + IP (U+x00) + I (U+x0B)Pi:
Cons + EP (U+x00) + E (U+x1B) Pe:
Cons + I + ConsP (U+x00) + I (U+x0B) + T (U+x01) Pit or Pet or Peet or Pate:
Cons + E + Cons P (U+x00) + E (U+x1B) + T (U+x01)
Character Sequences: Modifying Dots on Consonants
ContextCharacter SequenceValueForms
Arc Consonant + Combining Chinook Modifying DotN (U+x06) + CCMD (U+x1F)Ng:
Sh (U+x07) + CCMD (U+x1F)Ch/J:
S (U+x08) + CCMD (U+x1F)Ts/Z:
Character Sequences: Modifying Dot and Combining Diacritical Marks on Vowels
ContextCharacter SequenceValueForms
U + CCMD U (U+x0E) + CCMD (U+x1F) Uh (/xW/ in Salish):
Cons + I + Dot Above + Cons P (U+x00) + I (U+x0B) + COMBINING DOT ABOVE (U+0307) + T (U+x01)Pit:
T (U+x01) + I (U+x0B) + COMBINING DOT ABOVE (U+0307) + P (U+x00)Tip:
P (U+x00) + E (U+x1B) + COMBINING DOT ABOVE (U+0307) + T (U+x01)Pit:
Cons + I + Dot Below + Cons P (U+x00) + I (U+x0B) + COMBINING DOT BELOW (U+0323) + T (U+x01)Pet:
T (U+x01) + I (U+x0B) + COMBINING DOT BELOW (U+0323) + P (U+x00)Tep:
P (U+x00) + E (U+x1B) + COMBINING DOT BELOW (U+0323) + T (U+x01)Pet:
Cons + I + Diaeresis + ConsP (U+x00) + I (U+x0B) + COMBINING DIAERESIS (U+0308) + T (U+x01)Peet:
T (U+x01) + I (U+x0B) + COMBINING DIAERESIS (U+0308) + P (U+x00)Teep:
P (U+x00) + E (U+x1B) + COMBINING DIAERESIS (U+0308) + T (U+x01)Peet:
Cons + I + Diaeresis Below + Cons P (U+x00) + I (U+x0B) + COMBINING DIAERESIS BELOW (U+0324) + T (U+x01)Peyt:
T (U+x01) + I (U+x0B) + COMBINING DIAERESIS BELOW (U+0324) + P (U+x00)Tape:
P (U+x00) + E (U+x1B) + COMBINING DIAERESIS BELOW (U+0324) + T (U+x01)Peyt:




Syllable forming

Legal algorithmic consonant clusters shall be of the following patterns a) a labial plosive (P or B) followed by or following S or a liquid (L or R); b) a dental plosive (T or D) followed by or following S/liquids or preceding consonant I (an I preceding A, O, or I); c) labio-dentals (F/V) followed by liquids, dental plosives, or velar plosives; d) velars (K/G) followed by or following S or liquids or preceding consonant I; e) S followed by plosives or liquids, or a legal consonant+S cluster followed by a plosive or liquid; f) Sh followed by or following R; g) Nasals (N/M) followed or following S or liquids. In the preceding list, all variants are the same class as their base character. e.g. rule (a) would be "a labial plosive (P, B, or variants) followed by or following S (or variants) or a liquid (L, R, or variants)".

Line consonants are p,b,t,d,f,v,k,g,l,r, and variants. Arc consonants are m,n,sh,s, and variants. H is a dot consonant. Circle vowels are o,a,oo,ow, wa,wi, and all w/o vowels. Arc vowels are u & i and its variants. Nasal vowels are an,in,on, and un.

Syllable breaking rules

Rules in bold, followed by
Trans.lit.er.a.tion1, Example1 - Trans.lit.er.a.tion2, Example2 - etc. (Periods symbolize syllable breaks)
Consonants adjacent a vowel belong to that vowel
Ip.soot,  - Wap.tos,  - Peł.ten,  - Tip.so,  - Ik.tas,  - Kim.ta, 
Consonants adjacent two vowels belong to the trailing vowel
Oo.kook,  - A.la,  - Ya.kwa,  - Ka.na.mokst,  - Li.li,  - Ma.mook, 
Legal consonant clusters belong to trailing vowels, as long as not adjacent to a preceding vowel
Klak.sta, 
The cluster T + L preceding a vowel joins to that vowel and only that vowel
Pa.tlach,  - Tlemen.tlemen,  - I.tloo.ilh, 
Adjacent consonants not forming legal clusters shall divide syllables
Wap.tos,  - Ash.noo,  - An.ka.ti,  - Kan.sih,  - Kim.ta,  - Kom.taks,  - Tsik.tsik,  - L.ma.lo, 
An "I" or "E" immediately preceding or following an "OO" or "OW", or preceding a W vowel shall divide syllables
Ni.wa,  - Tlemen.oo.it,  - I.tloo.ilh,  - E.h.poo.i,  - Kip.oo.it,  - Kla.h.ow.iam, 
An "I" immediately following a vowel (not U) and preceding a consonant shall be considered part of that vowel
Kwaits,  - Oi.h.at,  - H'loima,  - Eit,  - Fait, 
An I/E flavored vowel will join with a following consonant + "I"
Fraide, 
An "H" following T, D, K, L, or R, or preceding an L creates an H modified letter and does not effect syllable breaking.
Pelh.ten,  - Khel,  - Khow,  - The, 
An "H" adjacent two non-H modified letters will break syllables fore and aft.
Oi.h.at,  - I.h.t,  - Ka.h.ka.h,  - Ka.la.h.an,  - Ke.h.tsi,  - Kla.h.ow.iam,  - Sa.h.a.li,  - Wi.h.t,  - Ta.h.am,  - H.um,  - A.h.a,  - E.h.poo.i,  - Ili.h.e, 
A "U" will join with either preceding or trailing consonants, but not both
Kyu.tan, 
A "U" will first join with lone (without a vowel) consonants or clusters
Kyu.tan, 
A legal consonant or cluster bracketed by two "I"s or "E"s will share a syllable
Ili.h.e,  - Isik, 
A nasal vowel will displace and join two adjacent consonants, unless the adjacent are not similar (same angle) line consonants.
L(en)t, 
A nasal vowel will connect with a previous consonant and break syllables aft in all other circumstances.
X, 
All rules of syllable breaking can be overridden by ZERO WIDTH NON-JOINER (U+200C) and ZERO WIDTH JOINER (U+200D)


Archives of the Kamloops Wawa 1891-1900 (subscription required)

Dictionary of the Chinook Jargon, by George Gibbs, Echo Library ISBN 1-40680-924-1

Chinook:.... A History and Dictionary, by Edward Harper Thomas, 1935, Metropolitan Press, Portland, OR)


1