Typo handling in searching of Quran verse based on phonetic similarities

Authors

  • Naila Iffah Purwita Telkom University, Bandung
  • Moch Arif Bijaksana Telkom University, Bandung
  • Kemas Muslim Lhaksmana Telkom University, Bandung
  • Muhammad Zidny Naf’an Telkom Institute of Technology Purwokerto, Purwokerto

DOI:

https://doi.org/10.26594/register.v6i2.2065

Keywords:

autocomplete, Damerau–Levenshtein distance, phonetic similarity, Quran, typographical error

Abstract

The Quran search system is a search system that was built to make it easier for Indonesians to find a verse with text by Indonesian pronunciation, this is a solution for users who have difficulty writing or typing Arabic characters. Quran search system with phonetic similarity can make it easier for Indonesian Muslims to find a particular verse.  Lafzi was one of the systems that developed the search, then Lafzi was further developed under the name Lafzi+. The Lafzi+ system can handle searches with typo queries but there are still fewer variations regarding typing error types. In this research Lafzi++, an improvement from previous development to handle typographical error types was carried out by applying typo correction using the autocomplete method to correct incorrect queries and Damerau Levenshtein distance to calculate the edit distance, so that the system can provide query suggestions when a user mistypes a search, either in the form of substitution, insertion, deletion, or transposition. Users can also search easily because they use Latin characters according to pronunciation in Indonesian. Based on the evaluation results it is known that the system can be better developed, this can be seen from the accuracy value in each query that is tested can surpass the accuracy of the previous system, by getting the highest recall of 96.20% and the highest Mean Average Precision (MAP) reaching 90.69%. The Lafzi++ system can improve the previous system.

Author Biographies

Naila Iffah Purwita, Telkom University, Bandung

Department of Informatics

Moch Arif Bijaksana, Telkom University, Bandung

Department of Informatics

Kemas Muslim Lhaksmana, Telkom University, Bandung

Department of Informatics

Muhammad Zidny Naf’an, Telkom Institute of Technology Purwokerto, Purwokerto

Department of Informatics

References

M. M. Hamzah, "Peran dan Pengaruh Fatwa Mui dalam Arus Transformasi Sosial Budaya di Indonesia," Millah: Jurnal Studi Agama, vol. 12, no. 1, pp. 127-154, 2017.

I. Humaini, T. Yusnitasari, L. Wulandari, D. Ikasari and H. Dutt, "Informatian Retrieval of Indonesian Translated version of Al Quran and Hadith Bukhori Muslim," in International Conference on Sustainable Energy, Electronics, and Computing Systems (SEEMS), Greater Noida, India, 2018, 2018.

J. .. Pardeshi and B. Nandwalkar, "Survey on: Rule Based Phonetic Search for Slavic Surnames," Int.J.Computer Technology & Applications, vol. 7, no. 1, pp. 65-68, 2016.

M. A. Istiadi, "Sistem Pencarian Ayat Al-Quran Berbasis Kemiripan Fonetis," Institut Pertanian Bogor, Bogor, 2012.

W. Satriady, M. A. Bijaksana and K. M. Lhaksmana, "Quranic Latin Query Correction as a Search Suggestion," Procedia Computer Science, vol. 157, pp. 183-190, 2019.

V. C. Mawardi, R. Rudy and D. S. Naga, "Fast and Accurate Spelling Correction Using Trie and Damerau-levenshtein Distance Bigram," TELKOMNIKA, vol. 16, no. 2, pp. 827-833, 2018.

T. N. Maghfira, I. Cholissodin and A. W. Widodo, "Deteksi Kesalahan Ejaan dan Penentuan Rekomendasi Koreksi Kata yang Tepat Pada Dokumen Jurnal JTIIK Menggunakan Dictionary Lookup dan Damerau-Levenshtein Distance," Jurnal Pengembangan Teknlogi Informasi dan Ilmu Komputer (J-PTIIK), vol. 1 , no. 6, pp. 498-506, 2017.

G. R. Bunt, iMuslims: Rewiring the House of Islam, Chapel Hill, North Carolina, United States: University of North Carolina Press, 2009.

J.-F. Yeh, L.-T. Chang, C.-Y. Liu and T.-W. Hsu, "Chinese Spelling Check based on N-gram and String Matching Algorithm," in Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications, Taipei, Taiwan, 2017.

M. M. Hossain, M. F. Labib, A. S. Rifat, A. K. Das and M. Mukta, "Auto-correction of English to Bengali Transliteration System using Levenshtein Distance," in 7th International Conference on Smart Computing & Communications (ICSCC), Sarawak, Malaysia, 2019.

K. Balabaeva, A. Funkner and S. Kovalchuk, "Automated Spelling Correction. for Clinical Text Mining in Russian," in Medical Informatic Europe Conference Conference, 2020.

S. J. Putra, M. N. Gunawan and A. Suryatno, "Tokenization and N-Gram for Indexing Indonesian Translation of the Quran," in 6th International Conference on Information and Communication Technology (ICoICT), Bandung, 2018.

B. C. Gencosman, H. C. Ozmutlu and S. Ozmutlu, "Character n-gram application for automatic new topic identification," Information Processing & Management, vol. 50, no. 6, pp. 821-856, 2014.

K. Srinivasa and B. N. S. Devi, "GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents," Journal of The Institution of Engineers (India): Series B , vol. 98, p. 467–476, 2017.

P. Náther, "N-gram based Text Categorization," Comenius University, Bratislava, Slovakia, 2005.

N. Nizamkari, "Mining typos in text," in IEEE 7th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, 2016.

S. Thaiprayoon, A. Kongthon and C. Haruechaiyasak, "ThaiQCor 2.0: Thai Query Correction via Soundex and Word Approximation," in 5th International Conference on Advanced Informatics: Concept Theory and Applications (ICAICTA), Krabi, 2018.

M. Castelli, R. Dondi, G. Mauri and I. Zoppis, "Comparing incomplete sequences via longest common subsequence," Theoretical Computer Science, vol. 796, pp. 272-285, 2019.

R. Khan, M. Ahmad and M. Zakarya, "Longest Common Subsequence Based Algorithm for Measuring Similarity Between Time Series: A New Approach," World Applied Sciences Journal, vol. 24, no. 9, pp. 1192-1198, 2013.

G. Kawade, S. Sahu, S. Upadhye, N. Korde and M. Motghare, "An analysis on computation of longest common subsequence algorithm," in International Conference on Intelligent Sustainable Systems (ICISS), Palladam, 2017.

C. Blum and M. J. Blesa, "Hybrid techniques based on solving reduced problem instances for a longest common subsequence problem," Applied Soft Computing, vol. 62, pp. 15-28, 2018.

M. R. Islam, C. M. K. Saifullah, Z. T. Asha and R. Ahamed, "Chemical reaction optimization for solving longest common subsequence problem for multiple string," Soft Comput, vol. 23, p. 5485–5509, 2019.

R. Gabrys, E. Yaakobi and O. Milenkovic, "Codes in the Damerau Distance for Deletion and Adjacent Transposition Correction," IEEE Transactions on Information Theory, vol. 64, no. 4, pp. 2550-2570, 2018.

C. Zhao and S. Sahni, "Efficient computation of the Damerau-Levenshtein distance between biological sequences," in IEEE 7th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), Orlando, FL, 2017.

J. Kysela, "A Comparison of Text String Similarity Algorithms for POI Name Harmonisation," in Lecture Notes in Computer Science, Cham, Springer, 2018.

A. Anton, "Romanian Biometric Word List for Public Key Fingerprint Validation," in IEEE 12th International Symposium on Applied Computational Intelligence and Informatics (SACI), Timisoara, 2018.

A. F. A. Nwesri, "Effective Retrieval Techniques for Arabic Text," RMIT University, Melbourne, Victoria, Australia, 2008.

A. Samuelsson, "Weighting Edit Distance to Improve Spelling Correction in Music Entity Search," KTH Royal Institute of Technology, Stockholm, 2017.

Downloads

Published

2020-08-27

Issue

Section

Artikel