IMPLEMENTASI MEL-FREQUENCY CEPSTRAL COEFFICIENTS DAN CONVOLUTIONAL NEURAL NETWORK UNTUK PENGENALAN HURUF HIRAGANA

Authors

  • Muhammad Yusuf Ibrahim Ramadhani Universitas Islam Balitar
  • Saiful Nur Budiman Universitas Islam Balitar
  • Udkhiati Mawaddah Universitas Islam Balitar

DOI:

https://doi.org/10.36080/skanika.v9i1.3600

Keywords:

CNN, Hiragana, MFCC, Speech Recognition

Abstract

Japanese language learning has gained increasing interest in Indonesia; however, learners often experience difficulties in mastering Hiragana characters due to their large number and phonetic similarities. Speech recognition technology can be utilized as a supportive learning medium, particularly for improving pronunciation and enhancing learners’ understanding of Hiragana characters. This study aims to develop a Hiragana speech recognition system based on Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction and Convolutional Neural Networks (CNN) for classification. The dataset consists of 46 Hiragana characters, with each character recorded 20 times by four speakers, resulting in a total of 3,680 audio samples. The research stages include audio signal preprocessing, MFCC feature extraction, data augmentation, CNN model training, and performance evaluation using classification metrics. Experimental results indicate that the proposed model achieves an accuracy of 95% on the test data, with most Hiragana characters being correctly recognized. Misclassifications mainly occur among characters with similar phonetic characteristics. These results demonstrate that the MFCC-based CNN approach is effective for Hiragana speech recognition and has potential to be applied as an interactive digital learning medium for Japanese language education.

Downloads

Download data is not yet available.

References

[1] T. O. E. Mulyana, “Faktor Kesulitan Belajar Menulis Huruf Hiragana Pada Siswa Kelas X Sma Labschool Surabaya Tahun Ajaran 2019/2020,” Hikari, vol. 1, no. 4, pp. 61–67, 2020, [Online]. Available: https://ejournal.unesa.ac.id/index.php/kejepangan-unesa/article/view/33865

[2] B. P. Zhelita and R. Arni, “Efektivitas Media Puzzle Terhadap Penguasaan Hiragana Siswa SMA,” Omi. J. Bhs. dan Pembelajaran Bhs. Jepang, vol. 6, no. 2, pp. 242–255, 2023, doi: 10.24036/omg.v6i2.725.

[3] I. K. S. Buana, “Implementasi Aplikasi Speech to Text untuk Memudahkan Wartawan Mencatat Wawancara dengan Python,” J. Sist. dan Inform., vol. 14, no. 2, pp. 135–142, 2020, doi: 10.30864/jsi.v14i2.293.

[4] D. C. Khrisne and T. Hendrawati, “Indonesian Alphabet Speech Recognition forEarly Literacy using Convolutional NeuralNetwork Approach,” J. Electr. Electron. Informatics, vol. 4, no. 1, pp. 34–37, 2020, doi: 10.17509/ijal.v9i3.23223.

[5] S. Dwijayanti, A. Y. Putri, and B. Y. Suprapto, “Speaker Identification Using a Convolutional Neural Network,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 6, no. 1, pp. 140–145, 2022, doi: 10.29207/resti.v6i1.3795.

[6] M. Musaev, I. Khujayorov, and M. Ochilov, “Image Approach to Speech Recognition on CNN,” in ACM International Conference Proceeding Series, Association for Computing Machinery, 2019. doi: 10.1145/3386164.3389100.

[7] W. Mustikarini, R. Hidayat, and A. Bejo, “Real-Time Indonesian Language Speech Recognition with MFCC Algorithms and Python-Based SVM,” IJITEE, vol. 3, no. 2, pp. 55–60, 2019, doi: 10.22146/ijitee.49426.

[8] S. Shevira, I. Made, A. D. Suarjaya, and P. Wira Buana, “Pengaruh Kombinasi dan Urutan Pre-Processing pada Tweets Bahasa Indonesia,” JITTER-Jurnal Ilm. Teknol. dan Komput., vol. 3, no. 2, 2022, doi: 10.24843/JTRTI.2022.v03.i02.p06.

[9] S. Roy, P. Sharma, K. Nath, D. K. Bhattacharyya, and J. K. Kalita, “Pre-processing: A data preparation step,” Encycl. Bioinforma. Comput. Biol. ABC Bioinforma., vol. 1–3, pp. 463–471, Jan. 2018, doi: 10.1016/B978-0-12-809633-8.20457-3.

[10] A. Alsobhani, H. M. A. Alabboodi, and H. Mahdi, “Speech Recognition using Convolution Deep Neural Networks,” J. Phys. Conf. Ser., vol. 1973, no. 1, 2021, doi: 10.1088/1742-6596/1973/1/012166.

[11] H. Rafliansyah, B. Rahmat, and C. A. Putra, “Klasifikasi Suara Instrumen Musik Tiup Menggunakan Metode Convolutional Neural Network,” Merkurius J. Ris. Sist. Inf. dan Tek. Inform., vol. 2, no. 4, pp. 01–09, 2024, doi: 10.61132/merkurius.v2i4.119.

[12] U. Mawaddah, H. Armanto, and E. Setyati, “Prediksi Karakteristik Personal Menggunakan Analisis Tanda Tangan Dengan Mengggunakan Metode Convolutional Neural Network (Cnn),” Antivirus J. Ilm. Tek. Inform., vol. 15, no. 1, pp. 123–133, 2021, doi: 10.35457/antivirus.v15i1.1526.

[13] S. Dua et al., “Developing a Speech Recognition System for Recognizing Tonal Speech Signals Using a Convolutional Neural Network,” Appl. Sci., vol. 12, no. 12, 2022, doi: 10.3390/app12126223.

[14] S. Yusdiantoro and T. B. Sasongko, “Implementasi Algoritma MFCC dan CNN dalam Klasifikasi Makna Tangisan Bayi,” Indones. J. Comput. Sci., vol. 12, no. 1, pp. 1957–1968, 2023, doi: 10.33022/ijcs.v12i4.3243

[15] N. Asanah and I. Pratama, “Deep Learning Approach for Music Genre Classification using Multi - Feature Audio Representations,” Sist. J. Sist. Inf., vol. 14, pp. 2045–2054, 2025, doi: 10.32520/stmsi.v14i5.5369.

[16] I. Zulhaedi, “Kenapa Testing itu Penting?,” School of Information Systems. Accessed: Jan. 21, 2025. [Online]. Available: https://sis.binus.ac.id/2023/11/08/kenapa-testing-itu-penting/

[17] O. Colliot, A Non-technical Introduction to Machine Learning, vol. 197. 2023. doi: 10.1007/978-1-0716-3195-9_1.

[18] V. J. Varma et al., “Enhancing dysarthria severity classification: efficient audio based deep learning models,” Discov. Appl. Sci., vol. 7, no. 8, 2025, doi: 10.1007/s42452-025-07260-2.

Downloads

Published

2026-01-31

How to Cite

[1]
Muhammad Yusuf Ibrahim Ramadhani, Saiful Nur Budiman, and Udkhiati Mawaddah, “IMPLEMENTASI MEL-FREQUENCY CEPSTRAL COEFFICIENTS DAN CONVOLUTIONAL NEURAL NETWORK UNTUK PENGENALAN HURUF HIRAGANA”, SKANIKA, vol. 9, no. 1, pp. 1–12, Jan. 2026.