Search In this Thesis
   Search In this Thesis  
العنوان
Towards a novel data warehouses architecture /
الناشر
Emad Saddad Abdelhakiem Hussain ,
المؤلف
Emad Saddad Abdelhakiem Hussain
هيئة الاعداد
باحث / Emad Saddad Abdelhakiem Hussain
مشرف / Hoda Mokhtar Omar Mokhtar
مشرف / Osman Hegazy
مشرف / Ali Hamed Elbastawesy
تاريخ النشر
2021
عدد الصفحات
74 Leaves :
اللغة
الإنجليزية
الدرجة
الدكتوراه
التخصص
Information Systems
تاريخ الإجازة
14/11/2021
مكان الإجازة
جامعة القاهرة - كلية الحاسبات و المعلومات - Information Systems
الفهرس
Only 14 pages are availabe for public view

from 82

from 82

Abstract

Traditional Data Warehouse (DW) is a centralized data repository of non-volatile, subject-oriented, non-operational, integrated, and time variant data that integrates data from different heterogeneous data sources. DW is specifically developed for supporting decision making, analysis, data mining, and ad hoc queries.The structure and the volume of data stored on computer systems have recently been growing at an accelerated rate.Traditional DW has several problems to cope with such environments, such as architecture based on relational Database Management Systems (DBMSs), increasing their data volume, high disc space usage, slow query response time, and complicated administration. Furthermore, DWs depend on a static number of external data sources that may be incomplete, do not use the same definitions, and not always available. Therefore, there is an essential need to adjust traditional DW architecture to meet modern challenges imposed by data massiveness and current big data aspects. Further, a new architecture needs to address existing drawbacks such as availability, scalability, and efficiency of queries.This thesis introduces a novel DW architecture, called Lake Data Warehouse Architecture, to provide the capability to resolve the previously mentioned challenges for traditional DW. Lake Data Warehouse Architecture depends on integrating existing DW architecture with advanced technologies, such as the Hadoop framework and Apache Spark, in a novel and efficient hybrid solution. The main advantage of the proposed Lake Data Warehouse Architecture is that it combines the existing features in traditional DWs together with the big data features through joining the traditional DW with Hadoop and Spark ecosystems. Besides, it is suited to handle massive amounts of data while maintaining reliability, scalability, and availability