الفهرس | Only 14 pages are availabe for public view |
Abstract One of the leading causes of death worldwide is cancer. Microarray-based gene expression profiling has proven to be an effective technique for cancer diagnosis, prognosis, and treatment. DNA microarray technology is a significant tool that enables researchers to track the level of gene expression in an organism. Microarrays are used to measure the interactions of thousands of genes at the same time and create a global picture of cellular function. However, analyzing DNA microarray data is difficult for a variety of reasons. First, DNA microarray experiments usually produce many features for a small number of patients, resulting in a dataset with a high dimension. With a small number of samples, it contains several hundred or even thousands of genes. Second, Gene expression data is highly complex; genes are directly or indirectly correlated with one another, making classification a difficult task that typically necessitates the use of a powerful and accurate feature selection technique. To that end, the selection of relevant and informative genes remains a challenge in gene expression data analysis. The hybrid method shows superior performance in terms of high accuracy and small number of selected genes. This is because the hybrid algorithm deals perfectly with high dimensionality and over-fitting problems by applying filter approach first as preprocessing step to reduce the dimensionality of microarray gene expression profile. So, this thesis presents three hybrid models for cancer microarray data. The proposed models combine different machine learning techniques feature selection, optimization and classification. First model used decision tree (DT) classifier, Pearson correlation coefficient (PCC) as feature selection method and GridSearchCV Cross-Validation for tuning DT hyperparameter. Second model used support vector machine technique (SVM) as classification method and ensemble minimum redundancy maximum relevance (mRMRe) as feature selection method. Third model used two ensemble classifiers XGBoost, CatBoost and (mRMRe) as feature selection, Hyperopt as optimization methods. The experimental results show the effectiveness of these models in: ? Reduce dimensionality of high dimensional data (microarray data) ? selecting the most informative and relevant gene, that is efficient in cancer diagnoses. ? Enhancement the classification performance of cancer. Key Words: Machine Learning, DNA microarrays, Data Analysis, Gene Selection, Feature selection, Cancer Classification |