الفهرس | Only 14 pages are availabe for public view |
Abstract Analyzing Big Data has emerged as a significant activity for many organizations. This Big Data analysis is simplified by the MapReduce framework and execution environment, such as Hadoop and parallel systems, such as Hive. On the other, most of the MapReduce users have a complex query analysis that has expressed as individual MapReduce jobs. By using high-level query languages such as Pig, Hive, and Jaql, the user complex query expresses into Workflow s of MapReduce jobs. The work in this thesis concerns about how to reuse the previous results in the hive output file in the same or different sessions to improve the Hive performance. This has been done by introducing two algorithms. First called HOME (HiveQL Optimization in Multi-Session Environment). To evaluate our first developed HOME algorithm, it has implemented using 19 Different SQL Statement to reduce I/O in MapReduce Job. By developing HOME algorithm, a new HiveQL execution architecture based on materialized previous results has proposed |