    學位論文
    Please use this identifier to cite or link to this item: http://nccur.lib.nccu.edu.tw/handle/140.119/126580

    Title: 基於語境特徵及分群模型之中文多義詞消歧
    Using contextual information in clustering Chinese word senses
    Authors: 周子皓
    Chou, Tzu Hao
    Contributors: 劉昭麟

    Liu, Chao Lin
    Lai, Huei Lling

    Chou, Tzu Hao
    Keywords: 多義詞
    Lexical ambiguity
    Word vector
    Sentence vector
    Date: 2019
    Issue Date: 2019-10-03 17:17:45 (UTC+8)
    Abstract: 多義詞為語言中常見的現象,如英語中的‘bank’,既可表示「銀行」又可表示「河岸」;‘bass’,既可表示「鱸魚」又可表示「電吉他」,而在中文中「黃牛」,既可表示「普通的牛」又可表示「非法仲介人」。而在目前,對於多義詞義項的了解主要透過辭典以及檢索系統,但是,時常仍會有不足的情況,對於辭典,一般收錄較規範化的使用方式以及無法時刻更新。因此對於詞彙較新穎的義項以及較口語的使用方式,辭典並不一定包含;此外對於檢索系統,以中央研究院平衡語料庫檢索系統為例,此系統會將目標詞彙的相關句提供使用者,但是,對於多義詞的義項,使用者必須閱讀所有的相關句後才能得知,其在語料庫中的義項。同時,目前多義詞研究中,人文學者需逐一檢視所擷取出的相關句,並根據人工進行判讀,才能將相關句依據義項進行分群。
    同時,研究中為了觀察是否會因多義詞的類型不同而致使分群的效果以及embedding的結果會有所不同,因此於同形異義(homonym)選取「亞馬遜」、「蘋果」、「小米」、「火箭」、「東西」,作為研究對象;一詞多義(polysemy) 選取「出入」、「出發」、「壓力」、「溫暖」、「東西」,作為研究對象。
    Lexical ambiguityis a common language phenomenon. In English, the word bank can refer to the bank which we save our money or a river bank. In Chinese, the term cattle(黃牛) can stand for either a cattle or a scalper.
    Currently the understanding of lexical ambiguity terms come from either the dictionary or a search system. However, there are often times where a dictionary or a search system is not enough. Dictionaries have a standard procedure for including content and once the dictionary has been published it cannot be updated frequently. Therefore, dictionaries can fail to include new definitions or verbal usage. For search systems, using the Academia Sinica’s database as an example, users are required to read through all related sentences to understand related meanings. Current research on lexical ambiguity requires researchers to examine sentences, extract term meanings and cluster them one by one.
    In this study, the best clustering model and variables are selected based on purity values derived from references provided by the user. Then, the selected clustering model is used to find more terms and references with similar meanings from the database. Finally, the terms will be clustered according to selected meanings.
    This study also observes whether different types of lexical ambiguity will affect the results of clustering and embedding. Therefore, this study chooses homonym such as amazon and apple, polysemy’s such as departure and pressure as research subjects. This study hopes to reduce the time needed for researchers to examine sentences, extract term meanings and cluster them one by one in lexical ambiguity researches.
    Description: 碩士
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0104753029
    Data Type: thesis
    DOI: 10.6814/NCCU201901187
    Appears in Collections:[資訊科學系] 學位論文

    Files in This Item:

    File SizeFormat
    302901.pdf5256KbAdobe PDF14View/Open

