Advanced NLP Techniques for Generating Contextual and Grammatical Arabic Exam Questions

A H Azni; Farida Ridzuan; Najwa Hayaati Mohd Alwi; Sakinah Ali Pitchay; Zainur Rijal Abd Razak; Hanif Ridzwan Ahmad Rodzi; Ahmad A AlSabhany

doi:10.33102/mjosht.525

Authors

A H Azni Faculty of Science and Technology, Universiti Sains Islam Malaysia, Nilai 71800, Negeri Sembilan, Malaysia.
Farida Ridzuan CyberSecurity and Systems Research Unit, Faculty of Science and Technology, Universiti Sains Islam Malaysia, Nilai 71800, Negeri Sembilan, Malaysia.
Najwa Hayaati Mohd Alwi Faculty of Science and Technology, Universiti Sains Islam Malaysia, Nilai 71800, Negeri Sembilan, Malaysia.
Sakinah Ali Pitchay Faculty of Science and Technology, Universiti Sains Islam Malaysia, Nilai 71800, Negeri Sembilan, Malaysia.
Zainur Rijal Abd Razak Faculty of Major Language Studies, Universiti Sains Islam Malaysia, Bandar Baru Nilai, 71800 Nilai, Negeri Sembilan, Malaysia.
Hanif Ridzwan Ahmad Rodzi Faculty of Science and Technology, Universiti Sains Islam Malaysia, Nilai 71800, Negeri Sembilan, Malaysia.
Ahmad A AlSabhany Department of Electronics and Telecommunication Engineering, Daffodil International University Dhaka, Bangladesh.

DOI:

https://doi.org/10.33102/mjosht.525

Keywords:

NLP, Exam Question Generation, Arabic Corpus

Abstract

This paper outlines the development of an Arabic exam question generator that utilizes advanced Natural Language Processing (NLP) techniques and a comprehensive Arabic corpus. The primary aim is to aid educators in automating the process of crafting exam questions tailored specifically for A1-level Arabic learners. By harnessing the capabilities of NLP, the system integrates sequence-to-sequence (seq2seq) models and template-based methods to generate educationally appropriate questions. The seq2seq models are designed to predict the next word in a sequence, ensuring that the generated questions are natural and contextually fitting. This approach enables the system to produce logically coherent questions that align with the given context. Moreover, the template-based method guarantees grammatical accuracy, which is essential for educational purposes. The templates use as structured guidelines that steer the seq2seq models, ensuring that the questions adhere to proper grammatical rules and structures. A vital aspect of the system is the incorporation of the AraBERT pre-trained model. AraBERT, a transformer-based model customized for Arabic, undergoes fine-tuning with a specifically annotated dataset to adapt it to the task of generating questions from simple Arabic sentences, thereby enhancing its ability to handle the intricacies of the Arabic language. By combining seq2seq models for contextual relevance and template-based methods for grammatical precision, this dual approach effectively addresses the unique challenges associated with Arabic NLP. The richness of Arabic morphology and its syntactic complexity pose significant hurdles for NLP applications. Through the integration of these methodologies, the system ensures that the generated questions are not only contextually relevant but also grammatically correct, making it a valuable tool for educators. In conclusion, the paper discusses an innovative application of advanced NLP techniques and Arabic corpus utilization, providing a robust solution for automated Arabic exam question generation. This system holds significant potential for enhancing the efficiency and effectiveness of language instruction for Arabic learners.

Downloads

Download data is not yet available.

References

[1] Hwang, Gwo-Jen, Haoran Xie, Benjamin W. Wah, and Dragan Gašević. "Vision, challenges, roles and research issues of Artificial Intelligence in Education." Computers and Education: Artificial Intelligence 1 (2020): 100001. https://doi.org/10.1186/s40537-022-00625-z

[2] Bahy, Mazen. "Comparative analysis of Machinegenerated questions (Quillionz) and Human-generated questions." (2020).

[3] Basha, M. John, S. Vijayakumar, J. Jayashankari, Ahmed Hussein Alawadi, and Pulatova Durdona. "Advancements in natural language processing for text understanding." In E3S Web of Conferences, vol. 399, p. 04031. EDP Sciences, 2023. https://doi.org/10.1051/e3sconf/202339904031

[4] Thotad, Puneeth, Shanta Kallur, and Sukanya Amminabhavi. "Automatic question generator using natural language processing." Journal of Pharmaceutical Negative Results (2022): 2759-2764.

[5] Vakaliuk, Tetiana A., Oleksii V. Chyzhmotria, Svitlana O. Didkivska, and Illia Linevych. "Development of a web service for creating tests based on text analysis using natural language processing technologies." International Journal of Research in E-learning 9, no. 2 (2023): 1-22. https://doi.org/10.31261/IJREL.2023.9.2.04

[6] Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).

[7] Radford, Alec, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. "Improving language understanding by generative pre-training." (2018).

[8] Tomberg, Vladimir, Pjotr Savitski, Pavel Djundik, and Vsevolods Berzinsh. "Design and development of IMS QTI compliant lightweight Assessment delivery system." In Technology Enhanced Assessment: 19th International Conference, TEA 2016, Tallinn, Estonia, October 5-6, 2016, Revised Selected Papers 19, pp. 159-170. Springer International Publishing, 2017. https://doi.org/10.1007/978-3-319-57744-9_14

[9] Jones, Heather M. "Using innovative technologies to increase student engagement in an online Anatomy and Physiology course." The FASEB Journal 33, no. S1 (2019): 598-23. https://doi.org/10.1096/fasebj.2019.33.1_supplement.598.23

[10] Sherstinsky, Alex. "Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network." Physica D: Nonlinear Phenomena 404 (2020): 132306. https://doi.org/10.1016/j.physd.2019.132306

[11] Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." Advances in neural information processing systems 30 (2017). https://doi.org/10.48550/arXiv.1706.03762

[12] Ruiz-Rojas, Lena Ivannova, Patricia Acosta-Vargas, Javier De-Moreta-Llovet, and Mario Gonzalez-Rodriguez. "Empowering education with generative artificial intelligence tools: Approach with an instructional design matrix." Sustainability 15, no. 15 (2023): 11524. https://doi.org/10.3390/su151511524

[13] Maulud, Dastan Hussen, Siddeeq Y. Ameen, Naaman Omar, Shakir Fattah Kak, Zryan Najat Rashid, Hajar Maseeh Yasin, Ibrahim Mahmood Ibrahim, Azar Abid Salih, Nareen OM Salim, and Dindar Mik Ahmed. "Review on natural language processing based on different techniques." Asian Journal of Research in Computer Science 10, no. 1 (2021): 1-17. https://doi.org/10.9734/ajrcos/2021/v10i130231

[14] Gumaste, Priti, Shreya Joshi, Srushtee Khadpekar, and Shubhangi Mali. "Automated Question Generator System Using NLP Libraries." International Research Journal of Engineering and Technology (IRJET) 7, no. 6 (2020): 4568-4572.

[15] Menai, Mohamed El Bachir. "Detection of plagiarism in Arabic documents." International Journal of Information Technology and Computer Science 10, no. 10 (2012): 80-89. https://doi.org/10.5815/ijitcs.2012.10.10

[16] Marie-Sainte, Souad Larabi, Nada Alalyani, Sihaam Alotaibi, Sanaa Ghouzali, and Ibrahim Abunadi. "Arabic natural language processing and machine learning-based systems." IEEE Access 7 (2018): 7011-7020. https://doi.org/10.1109/ACCESS.2018.2890076

[17] Ali, Abbas Raza, Muhammad Ajmal Siddiqui, Rema Algunaibet, and Hasan Raza Ali. "A large and diverse Arabic corpus for language modeling." Procedia Computer Science 225 (2023): 12-21. https://doi.org/10.1016/j.procs.2023.09.086

[18] Keneshloo, Yaser, Tian Shi, Naren Ramakrishnan, and Chandan K. Reddy. "Deep reinforcement learning for sequence-to-sequence models." IEEE transactions on neural networks and learning systems 31, no. 7 (2019): 2469-2489. https://doi.org/10.1109/TNNLS.2019.2929141

[19] He, Xiao, Tian Zhang, Minxue Pan, Zhiyi Ma, and Chang-Jun Hu. "Template-based model generation." Software & Systems Modeling 18 (2019): 2051-2092. https://doi.org/10.1007/s10270-017-0634-5

[20] Chouikhi, Hasna, Hamza Chniter, and Fethi Jarray. "Arabic sentiment analysis using BERT model." In Advances in Computational Collective Intelligence: 13th International Conference, ICCCI 2021, Kallithea, Rhodes, Greece, September 29–October 1, 2021, Proceedings 13, pp. 621-632. Springer International Publishing, 2021. https://doi.org/10.1007/978-3-030-88113-9_50

[21] Alduailej, Alhanouf, and Abdulrahman Alothaim. "AraXLNet: pre-trained language model for sentiment analysis of Arabic." Journal of Big Data 9, no. 1 (2022): 72. https://doi.org/10.1186/s40537-022-00625-z

[22] Al Masri, Ahmad Mustafa Ali, Muhammad Suzuri Hitam, Wan Nural Jawahir Hj Wan Yussof, and Atallah Al-Shatnawi. "Novel Algorithm for Baseline Detection of Offline Arabic Handwritten Text Recognition." Journal of Advanced Research in Applied Sciences and Engineering Technology 37, no. 1 (2024): 56-68. https://doi.org/10.37934/araset.37.1.5668

[23] Ab Halim, A. H., Ridzuan, F., Zakaria, N. H., Zakaria, A. A., Mohd Alwi, N. H., Ali Pitchay, S., & Az-Zuhar, I. (2024). SAKTI©: Secured Chatting Tool Through Forward Secrecy. Journal of Advanced Research in Applied Sciences and Engineering Technology, 49(1), 54–62. https://doi.org/10.37934/araset.49.1.5462