Search In this Thesis
   Search In this Thesis  
العنوان
Initial data reorderering in mapreduce technique for specific data categories /
الناشر
Ahmed Abdelrahim Ali Eldouh ,
المؤلف
Ahmed Abdelrahim Ali Eldouh
هيئة الاعداد
باحث / Ahmed Abdelrahim Ali Eldouh
مشرف / Hatem Elkadi
مشرف / Mohamed Helmy Khafagy
مشرف / Hatem Elkadi
تاريخ النشر
2018
عدد الصفحات
87 Leaves :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Information Systems
تاريخ الإجازة
26/5/2019
مكان الإجازة
جامعة القاهرة - كلية الحاسبات و المعلومات - Information System
الفهرس
Only 14 pages are availabe for public view

from 90

from 90

Abstract

The rapid increase in big data sets presents an urgent need for handling the difficulty in storing and processing of these datasets. MapReduce is a recent programming model which was initiated by Google{u2019}s Team to handle big data sets and storing. Hadoop is an open source software with an implementation of MapReduce presented by Apache. MapReduce requires a shuffling phase to exchange global the intermediate data generated by the mapping phase, but the shuffling phase in MapReduce increases the overhead on performance. In this thesis, we explore the literature on the shuffling subject and discuss previous techniques adopted to enhance the performance of MapReduce. In addition to our focus on an approach to improve the performance of MapReduce through reducing the overhead caused by shuffling phase. Improving the locality of data will lead to eliminating the network overhead in the shuffling phase for the MapReduce. We achieve this by pre-partitioning data based on query-based similarity through the TF {u2013} IDF and Cosine similarity algorithms and grouping the related queries with each other using K-means clustering algorithm. In this regard, we support HDFS with the related data and control where data are stored to collocate the related data files in the same nodes