Author: Radad, Marwa Abbas Mohamed./ Title: The Utilization of Distributed Computing Systems in Bioinformatics /

Search In this Thesis

العنوان

The Utilization of Distributed Computing Systems in Bioinformatics /

المؤلف

Radad, Marwa Abbas Mohamed.

هيئة الاعداد

باحث / مروة عباس محمد رداد

مشرف / نوال أحمد الفيشاوي

مناقش / أيمن محمد حسن وهبة

مناقش / أيمن السيد أحمد

الموضوع

Parallel programs (Computer programs). Parallel computers. .Bioinformatics. Parallel processing (Electronic computers). Electronic data processing- Distributed processing.

تاريخ النشر

2014.

عدد الصفحات

101 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

الهندسة الكهربائية والالكترونية

الناشر

تاريخ الإجازة

12/6/2014

مكان الإجازة

جامعة المنوفية - كلية الهندسة الإلكترونية - هندسة وعلوم الحاسبات

الفهرس

Only 14 pages are availabe for public view

from

105

from

105

Abstract

Understanding gene regulatory network is a very important problem in bioinformatics. It involves how genes cooperate to perform functions, how any species responds to diseases or environmental insults, and how organisms are affected by genes disorders. One of the most important issues in this challenge is to find regulatory elements, especially the binding sites for transcription factors. The binding sites for expressed genes are called motifs. In the DNA sequence, motif is usually a short segment that occurs frequently, but not required to be an exact copy for each occurrence. This property of motif makes motif finding very difficult. Despite considerable efforts to date, finding these motifs remains a complex challenge for biologists and computer scientists.
This thesis presents a new version of a well known exact algorithm called Brute Force which does not miss a motif. The new algorithm is called Recursive Brute Force (R-BF). It combines several ideas to enhance the running time of the exhaustive search Brute Force algorithm. It has been proven that the average case time complexity of R-BF algorithm is exponential with the allowed mutations instead of the motif length. An improved version that is called (R-BF2) is introduced to show the flexibility and simplicity of R-BF algorithm.
Tremendous available DNA sequences that exist in distributed and dynamically changed databases brought High Performance Computing (HPC) technologies in the bioinformatics domain. In this thesis, R-BF algorithm is parallelized and implemented using two different approaches. A multi-threaded version (OMP-RBF) is implemented using Open Multi-Processing (OpenMP). OMP-RBF runs on shared memory multicore architecture. It suffers from performance degradation due to the heap contention problem. Different solutions have been investigated to solve the heap contention problem. The second implementation is based on Message Passing Interface (MPI) that is called MPI-RBF. MPI-RBF has been implemented shared memory multicore architecture to compare it with OMP-RBF. The efficient handling of the data locality boost the scalability of the MPI-RBF. MPI approach outperforms OpenMP in such computationally-intensive, memory-intensive, and communication-less problem. MPI-RBF is implemented also on a cluster of workstations and a cluster of symmetric multi-processors (SMP) nodes in order to exploit multi-level parallelism. Parallel versions of (R-BF2) are also implemented and compared with the previously mentioned implementations. Practical experiments show that high computational power of distributed computing systems can greatly enhance thesolutions to computationally intensive problems of bioinformatics research.