Author: Mosa, Mohammed El-Sayed El-Araby Mohamady./ Title: Deploying the cloud based architecture for web searching services /

Search In this Thesis

العنوان

Deploying the cloud based architecture for web searching services /

المؤلف

Mosa, Mohammed El-Sayed El-Araby Mohamady.

هيئة الاعداد

باحث / مجدى السيد العربي محمدى

مشرف / مجدى زكريا رشاد

مشرف / شريهان محمد أبوالعنين

مشرف / حسام محمد مفتاح

مناقش / شريهان محمد أبوالعنين

الموضوع

Cloud computing. Virtual Machine. Neural networks (Computer science).

تاريخ النشر

2019.

عدد الصفحات

online resource (119 pages) :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

Computer Science (miscellaneous)

تاريخ الإجازة

1/1/2019

مكان الإجازة

جامعة المنصورة - كلية الحاسبات والمعلومات - Department of Computer Science

الفهرس

Only 14 pages are availabe for public view

from

119

from

119

Abstract

The web content is characteristic with that enormous and dynamic update. The web search is an attractive topic for research and development in different considerations including architecture, web covering, focusing on specific topic, and expandability. This thesis focuses on the architecture of the web crawler and how to adapt the crawler architecture with features of cloud computing including expandability, geo-distribution, loosely coupling of system components, and providing services online as a third-party. Also this thesis focuses on the web crawler functionality to crawl a specific topic with high accuracy using advanced techniques based on well design architecture over cloud computing. This thesis treats with the web crawling into successive phases. Proposed an architecture for the web crawling over cloud computing to utilize the feature of cloud computing. This thesis presents the concept of Crawler as a Service (CaaS) similar to those web services which are available in any region with collaborative distributed fashion over World Wide Web and APIs to offload scaling to another layer of abstraction. Crawler stages are designed as separate services and loosely coupled components. Each service can be configured and provided as separate and scalable over different remote areas. This crawler architecture can be customized to crawl a specific field, region, or language. This proposed architecture is based on adaptive customization and standardization for cloud services, so this architecture is more usable and fueled for numerous communities and cloud customers in creating own search engine. The proposed architecture of web crawler is compared with another designed for grid computing as a computing paradigm that compete of the cloud computing paradigm. The comparison proved that the cloud computing outperforms grid computing. This thesis proposes a new Focused Crawler (FC) architecture that can be introduced as a service over the cloud computing. The proposed FC has a service called a Topic Filter Service (TFS), which is responsible for filtering retrieved pages before indexing and extracting links to add them in the crawling queue. TFS relies on the Deep Neural Network (DNN) classifier. TFS is trained by a dataset processed by an outlier rejection using support vector machine classifier. Also, this proposed FC has a further service called Concept Weight Handler (CWH). It is responsible of handling the keywords such as concepts based on meanings and calculates the weight of each concept