PISIT' S THAI NATURAL LANGUAGE PROCESSING LABORATORY
This lab is formed since August 26, 1998
e-mail: pisitp@yahoo.com
For C7 members, please check this C7 address list.

KEYWORDS
Thai Natural Language Processing Lab., words segmentation, dictionaries, algorithms, Thai text-to-speech.
PERFORMANCE COMPARISON OF THAI WORD SEPARATION ALGORITHMS

Pisit Promchan (pisitp@yahoo.com)
Telecom Asia Corp. Public Co. Ltd.
2nd flr., 4th bldg., TOT
Changwatana Rd., BKK, Thailand

Yunyong Teng-amnuay (Yunyong.T@Chula.ac.th)
Department Of Computer Engineering, Chulalongkorn University
Bangkok 10330, Thailand

ABSTRACT
This papar presents a performance comparison of word-separation algorithms for Thai language. The research surveyed existing algorithms. A synthesis of performance indicators was attempted together with a development of measurement methodology. A body of Thai reference data was collected to validate the accuracy of Thai word separation. Experimental results show that the longest-word pattern-matching algorithm gives the most accurate output words while the backtracking algorithm gives the least error words. Word-usage-frequency algorithm gives the highest valid words ratio per number of words in its dictionary. The usage of ambiguity dictionary gives the best ambiguous case resolution, whereas the shortest-word pattern-matching algorithm gives the highest number of output words.
Full paper with pdf format click here [NCSEC'98]


This page hosted by   Get your own Free Home Page 1