Hangul Character Recognition of A New Hangul Dataset with Vision Transformers Model

Authors

  • Aurelia Shana Universitas Udayana
  • Sutramiani Ni Putu Universitas Udayana
  • Desy Purnami Singgih Putri Universitas Udayana

DOI:

https://doi.org/10.31598/sintechjournal.v7i3.1677

Keywords:

ViT, Hangul, Character, Recognition, Dataset

Abstract

This study aims to develop a Vision Transformers (ViT) model for recognizing Korean characters (Hangeul) in response to the growing interest in learning the Korean language and Korean culture in Indonesia. The research methodology involves training the ViT model using a comprehensive dataset of 29,636 base Korean characters. The ViT model has achieved a significant level of accuracy with the score of 93% in recognizing base Korean characters. By integrating deep learning, this study is expected to make a positive contribution to the development of language learning tools for Korean character recognition, unlocking its potential for applications and systems based on the Korean language.

Downloads

Download data is not yet available.

References

L. Yoon, “Leading reasons for taking the Test of Proficiency in Korean (TOPIK) in 2018.” Accessed: Jul. 10, 2023. [Online]. Available: https://www.statista.com/statistics/1057989/south-korea-reasons-for-taking-the-korean-language-test/

Henry and Dinny Mutiah, “Indonesia Tempati Urutan ke-4 Penggemar Korean Wave Terbesar di Dunia,” Liputan6.com. Accessed: Jul. 10, 2023. [Online]. Available: https://www.liputan6.com/lifestyle/read/4678671/indonesia-tempati-urutan-ke-4-penggemar-korean-wave-terbesar-di-dunia

Radikto and Rasiban, “Pengenalan Pola Huruf Hangeul Korea Menggunakan Jaringan Syaraf Tiruan Metode Backpropagation dan Deteksi Tepi Canny,” Jurnal Pendidikan dan Konseling, vol. 4, no. 5, pp. 1–10, 2022.

B. Yadav, A. Indian, and G. Meena, “HDevChaRNet: A deep learning-based model for recognizing offline handwritten devanagari characters,” Journal of Autonomous Intelligence, vol. 6, no. 2, 2023, doi: 10.32629/jai.v6i2.679.

S. R. Zanwar, Y. H. Bhosale, D. L. Bhuyar, Z. Ahmed, U. B. Shinde, and S. P. Narote, “English Handwritten Character Recognition Based on Ensembled Machine Learning,” Journal of The Institution of Engineers (India): Series B, Oct. 2023, doi: 10.1007/s40031-023-00917-9.

Irham Ferdiansyah Katili, Mochamad Arief Soeleman, and Ricardus Anggi Pramunendar, “Character Recognition of Handwriting of Javanese Character Image using Information Gain Based on the Comparison of Classification Method,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 7, no. 1, pp. 193–200, Feb. 2023, doi: 10.29207/resti.v7i1.4488.

M. B. Bora, D. Daimary, K. Amitab, and D. Kandar, “Handwritten Character Recognition from Images using CNN-ECOC,” in Procedia Computer Science, Elsevier B.V., 2020, pp. 2403–2409. doi: 10.1016/j.procs.2020.03.293.

V. Pomazan, I. Tvoroshenko, and V. Gorokhovatskyi, “Handwritten Character Recognition Models Based on Convolutional Neural Networks,” 2023.

M. M. Khan, M. S. Uddin, M. Z. Parvez, and L. Nahar, “A squeeze and excitation ResNeXt-based deep learning model for Bangla handwritten compound character recognition,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 6, pp. 3356–3364, Jun. 2022, doi: 10.1016/j.jksuci.2021.01.021.

D. Gui, K. Chen, H. Ding, and Q. Huo, “Zero-shot Generation of Training Data with Denoising Diffusion Probabilistic Model for Handwritten Chinese Character Recognition,” Computer Vision and Pattern Recognition (cs.CV), May 2023, [Online]. Available: http://arxiv.org/abs/2305.15660

M. Li et al., “TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models,” Sep. 2021, [Online]. Available: http://arxiv.org/abs/2109.10282

Hubert, P. Phoenix, R. Sudaryono, and D. Suhartono, “Classifying Promotion Images Using Optical Character Recognition and Naïve Bayes Classifier,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 498–506. doi: 10.1016/j.procs.2021.01.033.

A. Sharma, S. Kaur, S. Vyas, and A. Nayyar, “Optical Character Recognition Using Hybrid CRNN Based Lexicon-Free Approach with Grey Wolf Hyperparameter Optimization,” 2023, pp. 475–489. doi: 10.1007/978-981-99-2730-2_47.

M. Fujitake, “DTrOCR: Decoder-only Transformer for Optical Character Recognition,” Aug. 2023, [Online]. Available: http://arxiv.org/abs/2308.15996

Y. Li, D. Chen, T. Tang, and X. Shen, “HTR-VT: Handwritten text recognition with vision transformer,” Pattern Recognit, vol. 158, p. 110967, Feb. 2024, doi: 10.1016/J.PATCOG.2024.110967.

A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Oct. 2020, [Online]. Available: http://arxiv.org/abs/2010.11929

V. Agrawal, J. Jagtap, and M. P. Kantipudi, “Decoded-ViT: A Vision Transformer Framework for Handwritten Digit String Recognition,” Revue d’Intelligence Artificielle, vol. 38, no. 2, pp. 523–529, Apr. 2024, doi: 10.18280/ria.380215.

K.-M. Lee and S. R. Ramsey, A History of The Korean Language. 2011.

F. Chollet, Deep Learning with Python. Manning Publications, 2017.

Keras Team, “Keras ImageDataGenerator Documentation,” TensorFlow. Accessed: Jul. 20, 2024. [Online]. Available: https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.

J. Casia, “Handwritten Hangul Characters,” Kaggle. Accessed: Dec. 14, 2023. [Online]. Available: https://www.kaggle.com/datasets/wayperwayp/hangulkorean-characters/data.

Downloads

Published

2024-12-31