Author: Mohamed,Wael Zakaria Abd Allah./ Title: ON DATA MINING IN BIOINFORMATICS \

Search In this Thesis

العنوان

ON DATA MINING IN BIOINFORMATICS \

المؤلف

Mohamed,Wael Zakaria Abd Allah.

هيئة الاعداد

مشرف / وائل زكريا عبدالله محمد

مشرف / فايد فائق محمد غالب

مشرف / ياسر قطب السيد قطب

تاريخ النشر

2015.

عدد الصفحات

186p.;

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

الرياضيات

تاريخ الإجازة

1/1/2015

مكان الإجازة

جامعة عين شمس - كلية العلوم - قسن الرياضيات

الفهرس

Only 14 pages are availabe for public view

from

Abstract

Using data mining techniques in bioinformatics is an attractive research
area. This thesis is devoted to extract the association rules
(as one of the basic data mining tasks) included in DNA microarray
datasets. The thesis presents a study of various algorithms concerning
the row and column enumeration based methods, and it concentrates
on analysing the MAXCONF algorithm. Biological evaluations show
that the MAXCONF algorithm provides excellent results compared
with other algorithms applied to DNA microarray. It mines maximal
high confident association rules, MHCR, from DNA microarray
dataset. However, it has two drawbacks which are represented in
consuming expensive computations and producing many useless rules.
Therefore, a slight modification of the algorithm is proposed for fixing
these drawbacks. Moreover, two versions of the MAXCONF algorithm
have been implemented: the first (MAXCONF1) mines MHCR from
up-expressed genes dataset. The second (MAXCONF2) mines MHCR
from up/down-expressed genes dataset.
Two new sequential algorithms (MMHCR and MCR-Miner) are
proposed for mining maximal high confident association rules from
up-expressed and up/down-expressed genes dataset respectively. Really,
these algorithms are the column enumeration based analog of
MAXCONF1 and MAXCONF2 algorithms. In order to build these
algorithms a binary representation (BR) of genes is introduced and
a novel data structure (MAR-Tree) is proposed. The MMHCR and
MCR-Miner differ in using different discretization methods.
In addition, a third sequential algorithm (IMCR-Miner) is suggested for storing BR of the genes and preventing the occurrences of repeated
comparisons. In this prospect, comparative studies between the proposed
algorithms and MAXCONF algorithm are carried on.
Moreover, PMCR-Miner algorithm, a parallel version of the IMCRMiner
algorithm is constructed based on the independency of the
IMCR-Miner tasks. The PMCR-Miner algorithm is based on sharedmemory
systems and task parallelism. The PMCR-Miner needs no
time in the processes of sharing and combining data between processors.
Several experiments are carried on real DNA microarray datasets
in order to study the performance, speedup, efficiency, and memory
consumption of the proposed algorithm.
Finally, the classification of cancer gene expression microarray datasets
is another problem which will be discussed during this thesis. In
this regard, an associative classifier framework (namely MinCAR-
Classier) is built. This classifier mines only minimal high confident
class association rules (MinCAR) from cancer gene expression dataset.
A comprehensive study is held for comparing the MinCAR-Classifier
with the related classifiers. Moreover, the proposed framework gives a
general description of the relationships of gene expression and disease.