Author: Saleh, Ahmed Ibrahim Mohamed./ Title: Building of a domain specific search engine \

Search In this Thesis

العنوان

Building of a domain specific search engine \

المؤلف

Saleh, Ahmed Ibrahim Mohamed.

هيئة الاعداد

باحث / أحمد ابراهيم محمد صالح

مشرف / علي ابراهيم الدسوقي

مشرف / هشام عرفات علي

باحث / أحمد ابراهيم محمد صالح

الموضوع

Web search engines. Computer networks. Computer programs.

تاريخ النشر

2006.

عدد الصفحات

247 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

هندسة النظم والتحكم

تاريخ الإجازة

1/1/2006

مكان الإجازة

جامعة المنصورة - كلية الهندسة - Computers Engineering and Systems

الفهرس

Only 14 pages are availabe for public view

from

273

from

273

Abstract

World Wide Web (WWW), or simply the web, is a huge repository of globally accessible information in variety of domains. Search engines are information retrieval systems that help users to efficiently search the web. They rely on active agents, which are called crawlers, to continuously traverse the web for new pages. However, given the huge volume of the web and its speed of change, the coverage of modern search engines is relatively small. In order to go around such defects, specialized search engines have been introduced. In contrast to the traditional (generalpurpose) search engines; vertical (domainspecific) ones use a special class of crawlers called ?focused crawlers?. Such class of crawlers traverses the web for locating pages with a specific topic or related to a certain domain of interest. This leads to increase the retrieval precision and recall. Although many strategies had been introduced for focused crawling, some of them are still under development, while others having problems or introducing degraded performances. The main objective of this thesis is to introduce an IntelligentAdaptive focused crawling strategy. The proposed crawler is intelligent as it can estimate the relevancy of a web page before actually visiting it. The proposed crawler is also adaptive as it keeps track with any changes that may arise in its domain of interest. Moreover, the proposed focused crawling strategy introduced in this thesis integrates evidence from different disciplines. It uses novel techniques for web page weighting, classification, and segmentation. It also involves novel methodologies for link scoring and constructing domain thesaurus. On the other hand, the proposed strategy provides the ability to boost the crawling efficiency continuously as it employs a novel machine learning technique, which is called Continuous Learning approach. Experimental results introduced in the thesis have shown that the proposed crawling strategy results in a notable improvement in focused crawling effectiveness (increase the precision and recall levels). Also, it overcomes various limitations of the traditional focused crawling approaches.