Author: El-Zeheiry, Heba Aly Ibrahim./ Title: Knowledge discovery technique for medical big data /

Search In this Thesis

العنوان

Knowledge discovery technique for medical big data /

المؤلف

El-Zeheiry, Heba Aly Ibrahim.

هيئة الاعداد

باحث / هبه على إبراهيم أحمد الزهيري

مشرف / شريف ابراهيم بركات

مشرف / أميرة رزق عبده

مناقش / هيثم عبد المنعم الغريب

الموضوع

Medical big data.

تاريخ النشر

2022.

عدد الصفحات

online resource (128 pages) :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

التعليم

تاريخ الإجازة

1/1/2022

مكان الإجازة

جامعة المنصورة - كلية الحاسبات والمعلومات - قسم نظم المعلومات

الفهرس

Only 14 pages are availabe for public view

from

128

from

128

Abstract

Electronic Health Records (EHRs) are the digital form of patients’ medical reports or records. EHRs facilitate advanced analytics and aid in better decision-making for clinical data. Massive and various data from the electronic health records (EHRs) generate enormous challenges such as massive, edundant, and incomplete data. In the past two decades, the expansion of using EHRs had a significant effect on the flow of the data. The vast flow of data is identified as ”Medical big data” which is the data that cannot be managed using current ordinary techniques or tools. If it is correctly handled, it generates interesting information, such as patient’s survival, medication decisions, and so on. There are many methods for analyzing medical big data. These compile massive volumes of health and medical data in order to compare treatment efficiency, identify medicine and device safety issues, speed up medical research, and study shifting trends of patient features and diseases. Machine learning can be used to help automatic data inconsistency correction and data extraction from numeric, and textual data, such as reading text and extracting quality metrics or problems that were not previously on a patient’s problem list. Medical data are very complicated and using one classification algorithm to reach good results is difficult. For this reason, the combination of classification techniques is used to reach an efficient and accurate classification model. This model combination is called the ensemble model. It is a high need to predict new medical data with a high accuracy value. A new ensemble model called MDRL is proposed which may be efficient with different datasets. The MDRL gives the highest accuracy value. It saves the processing time instead of processing four different algorithms sequentially; it executes the four algorithms in parallel. Therefore, five different algorithms were implemented on five variant datasets which are Heart Disease, Health General, Diabetes, Heart Attack, and Covid-19 Datasets. The four algorithms are Random Forest (RF), Decision Tree (DT), Logistic Regression (LR), and Multi-layer Perceptron (MLP). In addition to MDRL (the proposed ensemble model) which includes MLP, DT, RF, and LR together. from the implemented experiments, the conclusion is that our ensemble model has the best accuracy value for most datasets. In this thesis, the proposed model is implemented to classify different scales of medical data and predict new data. The results of four existing classification algorithms are compared with the proposed ensemble model. The comparison done on five different datasets to insure the efficiency of the proposed approach than others. The results show that the proposed approach provides a high accuracy results than the traditional data classifications algorithms. Adding, the results of our model with PCA, PSO, and CFS are compared separately. Then these results are compared with the accuracy of previous related works to insure the efficiency of the proposed approach. The combination of MDRL with CFS gives the best performance values for variant datasets. The accuracy values for the ensemble model are 98.86, 97.96, 100, 99.33, and 99.37 for five different datasets. But for the heart attack, the MDRL with the combination of PSO result in high accuracy value than PCA and CFS. The combination of the feature selection algorithm and the ensemble model is the best for giving the highest accuracy value. Additionally, the CFS is the best with a high volume of data compared with the PCA and PSO. However, the PSO with different algorithms reduces the running time, but it has less accuracy values than CFS. The potential issues of the proposed approach: the preprocessing aids in improving the accuracy as the good quality of data result in good performances. Also, the applying of CFS data reduction algorithm decreases the processing time of the classification. In addition, the implementation of ensemble model increases the accuracy of most datasets.