研究者総覧
論文
- タイトル
- タイトル(英)
- Encodings in legacy khmer truetype fonts: Investigation and propose of auto-detection algorithm
- 参照URL
- https://researchmap.jp/mikami_yoshiki/published_papers/16620950
- 著者
- 著者(英)
- Suzuki Toshiya,Masatake Yamato,Yoshiki Mikami
- 担当区分
- 概要
- 概要(英)
- In spite of ISO standards for most Indic scripts used in South and South-East Asian countries, legacy encodings are still used to avoid the implementation of complex text layout systems for Indic scripts. Since legacy encodings for Indic scripts are not well-defined and have been designed ad hoc, it is almost impossible to detect the encoding by deducive methods. As a result, the coded text is often dealt as image data rather than text. As a typical example of the confusion of non-standard legacy encodings, we take Khmer script. We collected the various free-charged legacy Khmer fonts distributed on the Web, and investigated the encodings declared and used in the fonts. As a result, the declared encodings are confirmed to be unreliable. Based on the code charts obtained by our investigation, we propose a heuristic algorithm to detect the encoding used in legacy Khmer fonts. This algorithm enables us to extract text data from legacy coded text with an accuracy higher than the one of cognitive methods. © Lavoisier. Tous droits réservés pour tous pays.
- 出版者・発行元
- 出版者・発行元(英)
- Lavoisier
- 誌名
- 誌名(英)
- Document Numerique
- 巻
- 9
- 号
- 3-4
- 開始ページ
- 45
- 終了ページ
- 68
- 出版年月
- 2006年
- 査読の有無
- 査読有り
- 招待の有無
- 掲載種別
- 研究論文(学術雑誌)
- ISSN
- 1279-5127
- DOI URL
- https://doi.org/10.3166/dn.9.3-4.45-68
- 共同研究・競争的資金等の研究課題
研究者
三上 喜貴
(ミカミ ヨシキ)