跳到主要內容區塊 :::
   
:::
首頁/研究主軸/課程教學/研究計畫/華語文詞語表詞類標記對應之研究

華語文詞語表詞類標記對應之研究

A comparative analysis of the correspondence of part-of-speech systems between COCT and TCSL teaching materials
  • 資料類型

    研究計畫

  • 計畫編號

    NAER-2019-029-C-1-1-B5-01

  • GRB編號

    PG10811-0071

  • 計畫名稱

    華語文詞語表詞類標記對應之研究 A comparative analysis of the correspondence of part-of-speech systems between COCT and TCSL teaching materials

  • 整合型計畫名稱

    華語文學習者分級標準建構及語料庫整合應用

  • 所屬計畫

    子計畫一

  • 計畫主持人

    李詩敏

  • 經費來源

    國家教育研究院

  • 執行方式

    自行研究(本院經費-本院人員)

  • 執行機構

    國家教育研究院

  • 執行單位

    語文教育及編譯研究中心

  • 年度

    2019

  • 期程(起)

    2019-08-01

  • 期程(迄)

    2020-12-31

  • 執行狀態

    執行中

  • 關鍵詞

    詞類標記,華語文語料庫,華語文教材,中央研究院漢語平衡語料庫

  • Keywords

    POS tagging,Sinica corpus,COCT,TCSL teaching materials

  • 研究主軸

    建構知識基礎,精進教科書品質

  •   藉由詞類標記,語料庫中龐大且複雜的詞彙可被歸類劃分為數個至數十個關係,因此,詞類標記是語料庫中最重要且關鍵的訊息,有助於語言研究及教學。目前國內不少語料庫(包括本整合型計畫所建置的「華語文語料庫」)的詞類標記沿用「中央研究院漢語平衡語料庫」簡化詞類標記,此套詞類標記集適合用於語言分析及自然語言處理,共46個標記,和現行坊間主要的華語文教材所使用的詞類標記系統無法一對一對應。為整合華語文各套教材在詞類標記的對應,以及連結語言學理論與華語文教學在詞類上的知識基礎,子計畫一採用文獻分析及專家諮詢之研究方法,探討各套詞類標記集的內容及分類依據,並據此建立中研院詞類標記與華語文教材的詞類標記對應轉換規則,以推廣「華語文語料庫」在華語教學、教材編纂及研究分析之應用。

  •   The large and complex number of words in corpora can be classified into several to tens of relationships by part-of-speech tagging; therefore, POS tagging is the most important and crucial information of corpora as well as beneficial to language research and teaching. At present, the POS tagging in Sinica Corpus which is adopted by many corpora in Taiwan, including COCT (i.e. Corpus of Contemporary Taiwanese Mandarin) constructed by this integrated project. The Sinica POS tagging is applicable to language analysis and NLP, and is not one-to-one mapping to the POS systems in TCSL teaching materials. In order to integrate the correspondence of the POS systems among TCSL teaching materials and to link the knowledge base of the POS tagging between linguistics and TCSL, this subproject adopts the research methods of literature review and expert consultation to explore the content and classification among these POS systems, and then to establish corresponding rules of the POS systems between Sinica Corpus and TCSL teaching materials for the purpose of promoting the application of COCT in language teaching, textbook compilation and research analysis.

top
回首頁 網站導覽 FAQ 意見信箱 EN
facebook youtube