Search In this Thesis
   Search In this Thesis  
العنوان
An enhanced hybrid approach for word segmentation /
الناشر
Mohamed Karam Ali Farag Allah ,
المؤلف
Mohamed Karam Ali Farag Allah
هيئة الاعداد
باحث / Mohamed Karam Ali Farag Allah
مشرف / Hesham Ahmed Hefny
مشرف / Hesham Ahmed Hefny
مشرف / Hesham Ahmed Hefny
تاريخ النشر
2018
عدد الصفحات
91 Leaves :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Computer Science (miscellaneous)
تاريخ الإجازة
21/10/2018
مكان الإجازة
جامعة القاهرة - كلية الحاسبات و المعلومات - Computer and Information Science
الفهرس
Only 14 pages are availabe for public view

from 110

from 110

Abstract

Word segmentation is the process of finding the best likely sequence of words from a sequence of characters without clear delimiters. The main problems of word segmentation methods are ambiguity and the need of a dataset with a big size. Several researches proposed solutions to word segmentations using heuristic techniques. The last techniques task is to hopefully find the best segmentation without searching the entire state spaces. The performance of a word segmentation method can be measured using quantitative measures such as recall, precision and F-measure. There are two main contributions in this research. The first one is proposing a hybrid approach for word segmentation. The second contribution is proposing a GA-based parameter optimization for the word segmentation method. The proposed word segmentation method without optimization is compared to other related work, and it was found that our method can perform better or as same as other methods. Additionally, the results of the method without optimization and the results of the method after optimization are compared, and It was found that the method after parameter optimization achieved better results. To show that the presented approach is domain language independent, the approach is experimented furthermore on the Chinese and Arabic languages. For the Arabic language, a dataset of 10 million words is used. The F-measure result before the optimization is 89.1%