Author: Mostafa, Mostafa Mohamed Yacoub./ Title: Scalability in adaptive data stream mining /

Search In this Thesis

العنوان

Scalability in adaptive data stream mining /

المؤلف

Mostafa, Mostafa Mohamed Yacoub.

هيئة الاعداد

باحث / مصطفى محمد يعقوب مصطفى

مشرف / محمد بدر سنوسي

مشرف / أميرة رزق عبده

مناقش / حازم مختار البكري

مناقش / عاطف زكي غلوش

الموضوع

Data mining. Data mining - Mathematical models. Streaming technology - Telecommunications.

تاريخ النشر

2022.

عدد الصفحات

online resource (73 pages) :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

Information Systems

تاريخ الإجازة

1/1/2022

مكان الإجازة

جامعة المنصورة - كلية الحاسبات والمعلومات - قسم نظم المعلومات

الفهرس

Only 14 pages are availabe for public view

from

Abstract

Data streams gained obvious attention by research for years. Mining this type of data generates challenges because of their special nature. Because of their higher accurate results and greediness decision trees were among the most used techniques in classifying data streams. This dissertation provides a review for classification techniques in adaptive data stream mining. Focusing on both challenges ; concept drifts and dimensionality reduction and dividing these techniques into incremental and ensemble. Incremental classifiers such as Very Fast Decision Trees (VFDT) and Concept-adapting Very Fast Decision Trees (CVFDT) are tested. Adaptive Random Forests (ARF) was taken as an example for adaptive ensemble classifiers. Furthermore, an experimental analysis between VFDT, CVFDT and ARF is held. The analysis is according to accuracy, processing speed, and tree size. Accuracy did not vary much between the three algorithms. ARF has much better results in speed and has the smallest number of tree nodes. Then, we demonstrate the Very Fast Decision Trees (VFDT) as one of the most used algorithms for decision trees. On later step, we present VFDT-S1.0 as an extension of VFDT using bagging and sampling techniques. Finally, we make a simulation on the two algorithms according to accuracy and processing time. The experimental result proves that the proposed modification reduces time of the classification by more than 20% in more than one dataset. Effect on accuracy was less than 1% in some datasets. Time results proved the suitability of the algorithm for handling fast stream mining.