Author: Mohamed، Nada Hassan Osman./ Title: Efficient processing of big data using clouds /

Search In this Thesis

العنوان

Efficient processing of big data using clouds /

المؤلف

Mohamed، Nada Hassan Osman.

هيئة الاعداد

باحث / ندى حسن عثمان محمد

مشرف / رانيا أحمد عبد العظيم ابو السعود

مشرف / احمد حسين مدين

مناقش / احمد علي نشات

الموضوع

qrmak

تاريخ النشر

2021

عدد الصفحات

73 p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

الهندسة الكهربائية والالكترونية

تاريخ الإجازة

5/2/2021

مكان الإجازة

جامعة الفيوم - كلية الهندسة - الهندسة الكهربية

الفهرس

Only 14 pages are availabe for public view

from

Abstract

Correlation between gene expression profiles across multiple samples and the
identification of inter-gene interactions is a critical technique for Co-expression
networking, which usually relies on all-pairs correlation (or a similar measure).It
helps to understand the molecular basis of complex disease traits as well as the
prediction of treatment responses of individual subjects. It is extremely useful in
biological analyses todays .The data set is for Liver Hepatocellular Carcinoma
cancer, .It is a complication of HCV .It is consists of 35 micro-array samples (16
samples for subjects with HCC and the remaining samples from normal subjects)
Due to the highly intensive processing of calculating the Pearson’s Correlation
Coefficient, PCC, matrix, it often takes too much processing time to accomplish it.
Therefore, in this work, Big Data techniques including MapReduce, and Spark has
been employed to calculate the PCC matrix to find the dependencies between all
huge numbers of genes measured in our high throughput microarray.
Multithreading Programming Model in both techniques are employed in this study
to achieve efficient performance.
To meet this need, IBM Analytic Engine (IAE) has been used as a flexible
framework to deploy analytics applications in a private cloud as a service. A
comparison between the running time of each phase in both of MapReduce and
Spark approaches has been held. Spark has yielded 80 times speed up for
calculating the PCC of 22777 genes, however the MapReduce attained barely 8
times speed up.
Keywords: Pearson's Correlation; Hadoop; MapReduce; Spark; Gene Co-expression
Networks; GCN; Affymetrix Microarrays.