Search In this Thesis
   Search In this Thesis  
العنوان
Streaming Data Analytics Using Machine
Learning on Large Scale Systems /
المؤلف
Hassan، Fawzya Ramadan Sayed.
هيئة الاعداد
باحث / فوزية رمضان سيد حسان
مشرف / عبدالمجيد أمين على
مشرف / مسعود إسماعيل مسعود شاهين
مناقش / محمد حسن ابراهيم
الموضوع
qrmak
تاريخ النشر
2020
عدد الصفحات
124 ص. :
اللغة
الإنجليزية
الدرجة
الدكتوراه
التخصص
Computer Science (miscellaneous)
تاريخ الإجازة
8/5/2020
مكان الإجازة
جامعة الفيوم - كلية الحاسبات والمعلومات - علوم الحاسب
الفهرس
Only 14 pages are availabe for public view

from 122

from 122

Abstract

Streaming Data Analytics in healthcare becomes a promising research direction due to the
popularity of the real-time monitoring and tracking systems. Due to the enormous amount
of healthcare streaming data and its higher speed, it is difficult to ingest, process, and an-
alyze such huge data to make real-time actions in case of emergencies by using traditional
methods. Therefore, The work in this dissertation concerns about how to build a real-time
system that can handle streaming data from health-based social streaming data or wearable
medical sensors and indicate the current status for the patient health. This has been done by
introducing two systems that considered real-time data to improve streaming data analytics.
The first contribution is called The Real-time Diabetes Disease Prediction System. It is
developed to predict diabetes disease from health-based social streaming data to indicate
patient health status. The proposed system aims to find the most accurate machine learning
model which has the highest accuracy of diabetes prediction. The experimental results have
determined that the Random Forest (RF) model has achieved the highest accuracy among
other models at 84.11%. For online prediction through social media, the system handled
streaming Twitter data about patients’ health. In doing so, Kafka and Spark streaming are
integrated into the backend of the proposed system. Then, the FR classifier is used to predict
the patient’s current health status in real-time.
The second contribution is called the Online Prediction System. The proposed system
focuses on applying streaming machine learning models on streaming health data events
ingested to spark streaming through Kafka topics. The experimental results are done on the
historical medical datasets and simulated wearable medical sensor data. The experimental
results have proved that the online prediction system can online learn and update the model
according to the new data arrival and window size.