Search In this Thesis
   Search In this Thesis  
العنوان
Applying Computational Intelligence Techniques For Diagnoses of Breast Cancer\
المؤلف
Muhammed,Muhammed Abd-elnaby Sadek.
هيئة الاعداد
مشرف / محمد عبدالنبي صادق محمد
مشرف / محمد إسماعيل رشدي
مشرف / ماركو الفونس
تاريخ النشر
2021.
عدد الصفحات
xiii,85p.:
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
علوم الحاسب الآلي
تاريخ الإجازة
1/1/2021
مكان الإجازة
جامعة عين شمس - كلية الحاسبات والمعلومات - قسم الحاسبات
الفهرس
Only 14 pages are availabe for public view

from 97

from 97

Abstract

Cancer, in particular breast cancer, is considered one of the most common causes of death worldwide, according to the world health organization. For this reason, extensive research efforts have been done in the area of accurate and early diagnosis of cancer in order to increase the likelihood of cure. Among the available tools for diagnosing cancer, microarray technology is commonly used in biological and medical science to study gene expression in cells. When a healthy tissue becomes cancerous, gene expression levels change. Tissues can be classified by looking for changes in gene expression. Although the huge number of features or genes in the microarray data may seem advantageous, many of these features are irrelevant or redundant, resulting in the deterioration of classification accuracy. To overcome this challenge, feature selection techniques are a mandatory preprocessing step before the classification process. Two hybrid feature selection approaches are proposed and applied on five breast cancer datasets, namely, Van’t veer, Chin, Chowdary, Gravier and West. First approach combined mutual information, Least Absolute Shrinkage and selection Operator and genetic algorithm (MI-LASSO-GA). The second approach combined mutual information, Least Absolute Shrinkage and selection Operator and particle swarm optimization (MI-LASSO-PSO). The following classifiers with 5-fold cross validation are used to assess the proposed approaches: Support Vector Machine (SVM), k-Nearest Neighbor (KNN), Random Forest (RF), Logistic Regression (LR), and extreme gradient boosting (XGBOOST). Statistical measures of accuracy, precision, recall and f1 metrics are used for performance analysis. MI-LASSO-PSO could produce subsets with fewer features than subsets produced by MI-LASSO-GA for all datasets except Chin. The average performance of MI-LASSO-GA on Van’t veer and west is better than MI-LASSO-PSO of 96.4% and 100% respectively. On the other hand, the average of the performance of MI-LASSO-PSO on Gravier, Chowdary and Chin of 91.5%, 99.2% and 96.8% respectively. The proposed approaches outperformed state of art techniques for all datasets.