Author: Abu-Hashim, Hanaa Fathi./ Title: Gene Expression Data Analytics Using<br>Classification Algorithms for Cancer Diagnosis/

Search In this Thesis

العنوان

Gene Expression Data Analytics Using
Classification Algorithms for Cancer Diagnosis/

الناشر

Faculty of science.

المؤلف

Abu-Hashim, Hanaa Fathi.

هيئة الاعداد

باحث / هناء فتحي ابو هاشم

مناقش / بسنت محمد الكفراوي

مشرف / هالة حلمي زايد

مشرف / سعيد فتحي الزغدي

الموضوع

microarray data analysis. classification techniques.

تاريخ النشر

2022

عدد الصفحات

109 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

الرياضيات الحاسوبية

تاريخ الإجازة

13/3/2022

مكان الإجازة

جامعة المنوفية - كلية العلوم - الرياضيات

الفهرس

Only 14 pages are availabe for public view

from

122

from

122

Abstract

One of the leading causes of death worldwide is cancer. Microarray-based
gene expression profiling has proven to be an effective technique for cancer
diagnosis, prognosis, and treatment. DNA microarray technology is a significant
tool that enables researchers to track the level of gene expression in an organism.
Microarrays are used to measure the interactions of thousands of genes at the same
time and create a global picture of cellular function. However, analyzing DNA
microarray data is difficult for a variety of reasons. First, DNA microarray
experiments usually produce many features for a small number of patients,
resulting in a dataset with a high dimension. With a small number of samples, it
contains several hundred or even thousands of genes. Second, Gene expression
data is highly complex; genes are directly or indirectly correlated with one another,
making classification a difficult task that typically necessitates the use of a
powerful and accurate feature selection technique. To that end, the selection of
relevant and informative genes remains a challenge in gene expression data
analysis. The hybrid method shows superior performance in terms of high accuracy
and small number of selected genes. This is because the hybrid algorithm deals
perfectly with high dimensionality and over-fitting problems by applying filter
approach first as preprocessing step to reduce the dimensionality of microarray
gene expression profile. So, this thesis presents three hybrid models for cancer
microarray data. The proposed models combine different machine learning
techniques feature selection, optimization and classification. First model used
decision tree (DT) classifier, Pearson correlation coefficient (PCC) as feature
selection method and GridSearchCV Cross-Validation for tuning DT
hyperparameter. Second model used support vector machine technique (SVM) as
classification method and ensemble minimum redundancy maximum relevance
(mRMRe) as feature selection method. Third model used two ensemble classifiers
XGBoost, CatBoost and (mRMRe) as feature selection, Hyperopt as optimization
methods. The experimental results show the effectiveness of these models in:
? Reduce dimensionality of high dimensional data (microarray data)
? selecting the most informative and relevant gene, that is efficient in cancer
diagnoses.
? Enhancement the classification performance of cancer.
Key Words: Machine Learning, DNA microarrays, Data Analysis, Gene
Selection, Feature selection, Cancer Classification