Search In this Thesis
   Search In this Thesis  
العنوان
Enhanced language model for Arabic language applications /
المؤلف
Erfan, Hatem Mohammed Noaman.
هيئة الاعداد
باحث / حاتم محمد نعمان عرفان
مشرف / محسن عبدالرازق رشوان
مشرف / شاهندة صلاح الدين سرحان
مناقش / شريف مهدي عبده
مناقش / سمير الدسوقي الموجي
الموضوع
Language and languages - Computer-assisted instruction. Language and languages - Technological innovations.
تاريخ النشر
2019.
عدد الصفحات
online resource (132 pages) :
اللغة
الإنجليزية
الدرجة
الدكتوراه
التخصص
علوم الحاسب الآلي
تاريخ الإجازة
1/8/2019
مكان الإجازة
جامعة المنصورة - كلية الحاسبات والمعلومات - علوم الحاسب
الفهرس
Only 14 pages are availabe for public view

from 132

from 132

Abstract

This thesis presents a novel Recurrent Neural Network Language model based on tokenization of words into three parts: the prefix, the stem, and the suffix. The proposed model is tested on the English AMI speech recognition dataset and the Online Open Source Arabic (OOSA) language corpus. Also, this thesis proposes a novel hybrid approach to automatically detect and correct Arabic spelling errors. The proposed model is based on the confusion matrix and the noisy channel spelling correction model combined with the proposed modified recurrent neural network-based language model. The confusion matrix was constructed using 163,452 pairs of spelling errors, and its corrected form extracted from the Qatar Arabic Language Bank (QALP). Based on the reported results, automatic spelling correction accuracy was enhanced by about 3.5% for the Arabic language misspelling mistakes dataset. Also, this thesis presents a novel approach for automatic Arabic text diacritization using deep encode-decode recurrent neural networks followed by several text correction steps to improve the overall system output accuracy. The proposed model achieves a morphological diacritization word error rate (WER) of 3.85% and a diacritic error rate (DER) of 1.12% respectively.