Author: Yasen, Alaa Eisa Mohammed Eisa./ Title: Improving performance of big data mining techniques /

Search In this Thesis

العنوان

Improving performance of big data mining techniques /

المؤلف

Yasen, Alaa Eisa Mohammed Eisa.

هيئة الاعداد

باحث / علاء عيسى محمد عيسى يسن

مشرف / حازم مختار البكرى

مشرف / سمير محمد عبدالرازق

مناقش / محمد حسن حجاج

مناقش / سماء محمد صبري شهاب

الموضوع

Data mining. Big data.

تاريخ النشر

2022.

عدد الصفحات

online resource (109 pages) :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

Information Systems

تاريخ الإجازة

1/1/2022

مكان الإجازة

جامعة المنصورة - كلية الحاسبات والمعلومات - قسم نظم المعلومات

الفهرس

Only 14 pages are availabe for public view

from

109

from

109

Abstract

At this current time, data stream classification plays a key role in big data analytics due to its enormous growth. Most of the existing classification methods used ensemble learning, which is trustworthy but these methods are not effective to face the issues of learning from imbalanced big data, it also supposes that all data are pre-classified. Another weakness of current methods is that it takes a long evaluation time when the target data stream contains a high number of features. In this thesis, we provide an overview of big data mining techniques. The main objective of this thesis is to develop a new model for incremental learning based on the proposed ant lion fuzzy-generative adversarial network. The proposed model is implemented in spark architecture. For each data stream, the class output is computed at slave nodes by training a generative adversarial network with the back propagation error based on fuzzy bound computation. The proposed model is implemented using Python programming. The required software for implementing the proposed model are Python version 3.7, Pycharm version 2020.3.2, Anaconda version 3, and Microsoft visual studio redistributable 2019. The proposed model is implemented and tested using WebKB dataset, 20 Newsgroup dataset, and Reuter dataset. The results clarify that this model overcomes the limitations of existing models as it can classify data streams that are slightly or completely unlabeled data and providing high scalability and efficiency. The results show that the proposed model outperforms state-of-the-art performance in terms of accuracy (0.861) precision (0.9328) and minimal Mean Square Error (0.0416).