Search In this Thesis
   Search In this Thesis  
العنوان
Cross-language Record Linkage for Big Data \
المؤلف
El-Mandouh,Doaa Medhat Mohamed El-Saeed
هيئة الاعداد
باحث / دعاء مدحت محمد السعيد المندوه
مشرف / أحمد حسن محمد يوسف
مشرف / شريف رمزي سلامة سلامة ششريف رمزي سلامة سلامة
مناقش / محمد جمال الدين درويش
تاريخ النشر
2016.
عدد الصفحات
126p.:
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
الهندسة الكهربائية والالكترونية
تاريخ الإجازة
1/1/2016
مكان الإجازة
جامعة عين شمس - كلية الهندسة - قسم هندسة الحاسبات والنظم
الفهرس
Only 14 pages are availabe for public view

from 152

from 152

Abstract

This thesis demonstrates the dire need for a powerful record linkage process to efficiently correlate data from different sources. It starts with an introduction about record linkage process with a survey on different techniques introduced in this area. It illustrates how the problem grows to be more complex when the goal is to manipulate big data. Subsequently, it presented the effectiveness and efficiency aspects. The former is needed for achieving high quality of matching records from different languages while the latter is needed for achieving a scalable load balanced record linkage process over large-scale multilingual data sources.
Afterword, the thesis introduces a novel technique relying on exiting pattern-based and phonetic matching techniques, which supports the matching of names written in different writing scripts effectively. Consequently, the thesis introduces a new cost-aware load balancing technique for achieving a better load balancing while matching large-scale multilingual data sources, which takes into consideration the different costs for matching cross-language records and mono-language ones. Finally, it applies the proposed techniques on some case studies, where they showed more effective and efficient results against existing techniques.