Author: Gafar, Mona Gamal El-Sayed./ Title: Intelligent hybrid machine learning algorithms in text mining applications /

Search In this Thesis

العنوان

Intelligent hybrid machine learning algorithms in text mining applications /

المؤلف

Gafar, Mona Gamal El-Sayed.

هيئة الاعداد

باحث / Mona Gamal EL-Sayed Gafar

مشرف / Ahmed Abo EL-Fetouh Saleh

مشرف / EL-Sayed Fouad Hasan Radwan

مشرف / Aziza Saad Asem

الموضوع

text mining. text classification. hybrid intelligent systems.

تاريخ النشر

2010.

عدد الصفحات

78 p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Information Systems

تاريخ الإجازة

1/1/2010

مكان الإجازة

جامعة المنصورة - كلية الحاسبات والمعلومات - Department of Information Systems

الفهرس

Only 14 pages are availabe for public view

from

101

from

101

Abstract

The popularity of the Internet and World Wide Web increases the need for information management of electronic texts. Textual document are the easier way in saving information in all aspects on the computer in spit of the difficulties in making use of these information. Text Mining is the discipline of retrieving meaningful information from natural language text. The main problems that face text mining are the feature reduction problem, the dimensionality problem, and accurate and fast classification problems. This thesis attempts to introduce implementation of new intelligent hybrid models which handles these problems. Transformation systems of the evolutionary computating algorithms and the machine learning algorithms are used to classify PLSNL (Partially Structured , Largely Natural) documents based on their structuring conventions. Genetic Algorithms , as evolutionary computating algorithm, are used to find the most significant ( informative) words in the feature reduction process based on the line structuring conventions. Thus, the most informative features (synopses) are extracted and the succinct feature vector is prepared to represent the document. Based on the succinct feature vector, a machine learning algorithm is needed for mining the associated categories. The machine learning algorithms, C4.5 and Classification based on Multiple Association Rules (CMAR) algorithm, are used to classify the documents. The new hybrid models, Hybrid Genetic and C4.5 Algorithm for Textual Document Classification and Hybrid Intelligent Model of Genetic Algorithms and Association Rules in Text mining, help decision-maker to conclude a sort of rules with the highest classification accuracy for documents. In contrast with other approaches, a comparison with previous approaches is illustrated. The comparative study shows the efficiency of the new hybrid models in increasing the classification accuracy and reducing the time consumed in classification process. The details and limitations of the new approaches are discussed and future works are suggested.