Hangul Character Recognition of A New Hangul Dataset with Vision Transformers Model
DOI:
https://doi.org/10.31598/sintechjournal.v7i3.1677Keywords:
ViT, Hangul, Character, Recognition, DatasetAbstract
This study aims to develop a Vision Transformers (ViT) model for recognizing Korean characters (Hangeul) in response to the growing interest in learning the Korean language and Korean culture in Indonesia. The research methodology involves training the ViT model using a comprehensive dataset of 29,636 base Korean characters. The ViT model has achieved a significant level of accuracy with the score of 93% in recognizing base Korean characters. By integrating deep learning, this study is expected to make a positive contribution to the development of language learning tools for Korean character recognition, unlocking its potential for applications and systems based on the Korean language.
Downloads
References
L. Yoon, “Leading reasons for taking the Test of Proficiency in Korean (TOPIK) in 2018.” Accessed: Jul. 10, 2023. [Online]. Available: https://www.statista.com/statistics/1057989/south-korea-reasons-for-taking-the-korean-language-test/
Henry and Dinny Mutiah, “Indonesia Tempati Urutan ke-4 Penggemar Korean Wave Terbesar di Dunia,” Liputan6.com. Accessed: Jul. 10, 2023. [Online]. Available: https://www.liputan6.com/lifestyle/read/4678671/indonesia-tempati-urutan-ke-4-penggemar-korean-wave-terbesar-di-dunia
Radikto and Rasiban, “Pengenalan Pola Huruf Hangeul Korea Menggunakan Jaringan Syaraf Tiruan Metode Backpropagation dan Deteksi Tepi Canny,” Jurnal Pendidikan dan Konseling, vol. 4, no. 5, pp. 1–10, 2022.
B. Yadav, A. Indian, and G. Meena, “HDevChaRNet: A deep learning-based model for recognizing offline handwritten devanagari characters,” Journal of Autonomous Intelligence, vol. 6, no. 2, 2023, doi: 10.32629/jai.v6i2.679.
S. R. Zanwar, Y. H. Bhosale, D. L. Bhuyar, Z. Ahmed, U. B. Shinde, and S. P. Narote, “English Handwritten Character Recognition Based on Ensembled Machine Learning,” Journal of The Institution of Engineers (India): Series B, Oct. 2023, doi: 10.1007/s40031-023-00917-9.
Irham Ferdiansyah Katili, Mochamad Arief Soeleman, and Ricardus Anggi Pramunendar, “Character Recognition of Handwriting of Javanese Character Image using Information Gain Based on the Comparison of Classification Method,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 7, no. 1, pp. 193–200, Feb. 2023, doi: 10.29207/resti.v7i1.4488.
M. B. Bora, D. Daimary, K. Amitab, and D. Kandar, “Handwritten Character Recognition from Images using CNN-ECOC,” in Procedia Computer Science, Elsevier B.V., 2020, pp. 2403–2409. doi: 10.1016/j.procs.2020.03.293.
V. Pomazan, I. Tvoroshenko, and V. Gorokhovatskyi, “Handwritten Character Recognition Models Based on Convolutional Neural Networks,” 2023.
M. M. Khan, M. S. Uddin, M. Z. Parvez, and L. Nahar, “A squeeze and excitation ResNeXt-based deep learning model for Bangla handwritten compound character recognition,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 6, pp. 3356–3364, Jun. 2022, doi: 10.1016/j.jksuci.2021.01.021.
D. Gui, K. Chen, H. Ding, and Q. Huo, “Zero-shot Generation of Training Data with Denoising Diffusion Probabilistic Model for Handwritten Chinese Character Recognition,” Computer Vision and Pattern Recognition (cs.CV), May 2023, [Online]. Available: http://arxiv.org/abs/2305.15660
M. Li et al., “TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models,” Sep. 2021, [Online]. Available: http://arxiv.org/abs/2109.10282
Hubert, P. Phoenix, R. Sudaryono, and D. Suhartono, “Classifying Promotion Images Using Optical Character Recognition and Naïve Bayes Classifier,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 498–506. doi: 10.1016/j.procs.2021.01.033.
A. Sharma, S. Kaur, S. Vyas, and A. Nayyar, “Optical Character Recognition Using Hybrid CRNN Based Lexicon-Free Approach with Grey Wolf Hyperparameter Optimization,” 2023, pp. 475–489. doi: 10.1007/978-981-99-2730-2_47.
M. Fujitake, “DTrOCR: Decoder-only Transformer for Optical Character Recognition,” Aug. 2023, [Online]. Available: http://arxiv.org/abs/2308.15996
Y. Li, D. Chen, T. Tang, and X. Shen, “HTR-VT: Handwritten text recognition with vision transformer,” Pattern Recognit, vol. 158, p. 110967, Feb. 2024, doi: 10.1016/J.PATCOG.2024.110967.
A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Oct. 2020, [Online]. Available: http://arxiv.org/abs/2010.11929
V. Agrawal, J. Jagtap, and M. P. Kantipudi, “Decoded-ViT: A Vision Transformer Framework for Handwritten Digit String Recognition,” Revue d’Intelligence Artificielle, vol. 38, no. 2, pp. 523–529, Apr. 2024, doi: 10.18280/ria.380215.
K.-M. Lee and S. R. Ramsey, A History of The Korean Language. 2011.
F. Chollet, Deep Learning with Python. Manning Publications, 2017.
Keras Team, “Keras ImageDataGenerator Documentation,” TensorFlow. Accessed: Jul. 20, 2024. [Online]. Available: https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.
J. Casia, “Handwritten Hangul Characters,” Kaggle. Accessed: Dec. 14, 2023. [Online]. Available: https://www.kaggle.com/datasets/wayperwayp/hangulkorean-characters/data.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Aurelia Shana, Sutramiani Ni Putu, Desy Purnami Singgih Putri
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright in each article belongs to the author.
- The authors admit that SINTECH Journal as a publisher who published the first time under Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.
- Authors can include writing separately, regulate distribution of non-ekskulif of manuscripts that have been published in this journal into another version (eg sent to respository institution author, publication into a book, etc.), by recognizing that the manuscripts have been published for the first time in SINTECH Journal