PISIT' S THAI NATURAL LANGUAGE PROCESSING LABORATORY
This lab is formed since August 26, 1998
e-mail: pisitp@yahoo.com
For C7 members, please check this C7 address list.
KEYWORDS
Thai Natural Language Processing Lab., words
segmentation, dictionaries, algorithms, Thai text-to-speech.
PERFORMANCE COMPARISON OF THAI WORD SEPARATION ALGORITHMS
Pisit Promchan (pisitp@yahoo.com)
Telecom Asia Corp. Public Co. Ltd.
2nd flr., 4th bldg., TOT
Changwatana Rd., BKK, Thailand
Yunyong Teng-amnuay (Yunyong.T@Chula.ac.th)
Department Of Computer Engineering, Chulalongkorn University
Bangkok 10330, Thailand
ABSTRACT
This papar presents a performance comparison of word-separation algorithms for Thai language.
The research surveyed existing algorithms. A synthesis of performance indicators was attempted
together with a development of measurement methodology. A body of Thai reference data
was collected to validate the accuracy of Thai word separation. Experimental results show that
the longest-word pattern-matching algorithm gives the most accurate output words while
the backtracking algorithm gives the least error words. Word-usage-frequency algorithm
gives the highest valid words ratio per number of words in its dictionary. The usage of ambiguity
dictionary gives the best ambiguous case resolution, whereas the shortest-word pattern-matching
algorithm gives the highest number of output words.
Full paper with pdf format click here [NCSEC'98]
This page hosted by
Get your own Free Home Page