cu thesis #2

PISIT' S THAI NATURAL LANGUAGE PROCESSING LABORATORY
This lab is formed since August 26, 1998
e-mail: pisitp@yahoo.com
For C7 members, please check this C7 address list.
KEYWORDS
Thai Natural Language Processing Lab., words segmentation, dictionaries, algorithms, Thai text-to-speech.

การสังเคราะห์ข้อความเสียงพูดภาษาไทยสำหรับคำทับศัพท์และคำนามเฉพาะ
THAI TEXT-TO-SPEECH SYNTHESIS FOR TRANSLITERATED WORDS AND PROPER NOUN
AJJIMA TUNSAKUL,
Computer Engineering,
Chulalongkorn University,
2001

บทคัดย่อ: วิทยานิพนธ์ฉบับนี้นำเสนอวิธีการสังเคราะห์ข้อความเสียงพูดภาษาไทยสำหรับคำทับศัพท์และคำนามเฉพาะ โดยใช้หลักเกณฑ์การทับศัพท์ภาษาอังกฤษในการตัดแบ่งพยางค์ภาษาไทยและนำไปค้นหาหน่วยเสียงย่อยที่มีค่าความใกล้เคียงทางเสียงมากที่สุดกับเสียงจากพยางค์ที่ตัดได้จากวิธีการประมาณค่าความใกล้เคียงทางเสียงในการค้นหาคำ สำหรับหน่วยเสียงย่อยเหล่านี้จะถูกนำมาต่อรวมกันเพื่อสร้างเสียงออกมาด้วยเทคนิคการสังเคราะห์เสียงโดยการต่อหน่วยเสียงย่อย ในขั้นตอนการประมาณค่าความใกล้เคียงทางเสียงได้นำค่าสมาชิกของเซตวิภัชนัยมาทำค้นหาพยางค์ที่มีค่าความใกล้เคียงทางเสียงมากที่สุดจากพจนานุกรมหน่วยเสียงที่บรรจุคำไว้ 4,446 คำ และทำการเลือกค่าพยางค์ที่มีค่าวิภัชนัยสูงที่สุด ซึ่งรูปแบบของเสียงที่ใช้จะเป็นการตัดหน่วยเสียงย่อยของคำ สำหรับผลการทดลองที่ได้ แสดงให้เห็นว่าค่าความแม่นยำของการสังเคราะห์เสียงมีค่าที่สูงเกิน 98.38%

Abstract: This thesis presents a method for Thai text-to-speech synthesis for transliterated words and proper nouns. The method uses English transliterated rules to segment Thai syllables, and finds speech segments with the most similar sounds for the syllables by soundex approximation matching. These speech segments are then combined to produce the output speech by using the concatenation synthesis technique. Our soundex approximation matching employs the fuzzy set to find the most similar syllable from the soundex dictionary that contains 4,446 words, and choose the syllable with the highest fuzzy value whose wave file is used as the speech segment. Experimental results show that the precision of speech synthesis is more than 98.38%