Search In this Thesis
   Search In this Thesis  
العنوان
Predicting genes related to parkinson’s disease /
المؤلف
El-Ghanam, Marwa Helmy Mohammed.
هيئة الاعداد
باحث / مروه حلمى محمد الغنام
مشرف / ايمان محمد الديداموني
مشرف / حسن حسين سليمان
مشرف / نغم السيد مكي
مناقش / محمد معوض عبدالسلام
الموضوع
Information technology. Computers. Parkinson’s disease. Information.
تاريخ النشر
2022.
عدد الصفحات
online resource (120 pages) :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
علوم الحاسب الآلي
تاريخ الإجازة
1/1/2022
مكان الإجازة
جامعة المنصورة - كلية الحاسبات والمعلومات - تكنولوجيا المعلومات
الفهرس
Only 14 pages are availabe for public view

from 120

from 120

Abstract

Identifying genes related to Parkinson’s disease (PD) is an active research topic in biomedical analysis, which plays a critical role in diagnosis and treatment. Recently, many studies have proposed different techniques for predicting disease-related genes. However, a few of these techniques are designed or developed for PD gene prediction. Most of these PD Detection techniques are developed to identify only protein genes and discard long noncoding (lncRNA) genes, which play an essential role in biological processes and the transformation and development of diseases. This thesis proposes a novel prediction system to identify protein and lncRNA genes related to PD that can aid in an early diagnosis. First, the genes were represented as DNA FASTA sequences from the University of California Santa Cruz (UCSC) genome browser and removed the redundancies. Second, the most significant features of DNA FASTA sequences were extracted using the PyFeat method with the AdaBoost as feature selection. These selected features achieved promising results compared with extracted features from some state-of-the-art feature extraction techniques: Fourier transform with five numerical representations, Representations Features Fusion (RFF), Pse-in-One2.0,iLearn, and SubFeat. Finally, the features were fed to the gradient-boosted decision tree (GBDT) to diagnose different tested cases. The proposed GBDT achieve best results compared with other classification algorithms: Logistic Regression (LR), Decision Tree (DT), Naïve Bayes (NB), bagging, Random Forest (RF), AdaBoost (AB), Support Vector Machine (SVM), and Linear Discriminant Analysis (LDA). Seven performance metrics were used to evaluate the performance of the proposed system: accuracy (ACC), the area under precision-recall (AUPR), the area under precision-recall (AUPR), F1-score, Matthews correlation coefficient (MCC), sensitivity (SEN), and specificity (SPC). The proposed system achieved the best results: ACC equals 78.6%, AUC equals 84.5%, AUPR equals 85.3%, F1-score equals 78.3%, MCC equals 0.575, SEN equals 77.1%, and SPC equals 80.2%. The experiments demonstrate promising results compared with other systems. To validate our proposed system based on the PyFeat with AB,and GBDT, it compared with state-of-the-art systems, which used FASTA sequences datasets in their studies. Also, the proposed prediction system is compared with some state-of-theart studies, which used same datasets in their studies. Finally, the proposed prediction system is used to predict new protein and lncRNA genes related to PD. The predicted top-rank protein and lncRNA genes are verified based on a literature review.