Author: Halim, Dalia Waguih Helmy./ Title: Creating Facets Hierarchy for Unstructured Arabic Documents \

Search In this Thesis

العنوان

Creating Facets Hierarchy for Unstructured Arabic Documents \

المؤلف

Halim, Dalia Waguih Helmy.

هيئة الاعداد

مشرف / داليا وجيه حلمي حليم

مشرف / نهى عدلي عطية

مشرف / خالد مجدي ناجي

مناقش / محمد عبد الحميد اسماعيل

مناقش / صالح عبد الشكور الشهابي

الموضوع

Computer Science.

تاريخ النشر

2012.

عدد الصفحات

139 p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

هندسة النظم والتحكم

تاريخ الإجازة

1/11/2012

مكان الإجازة

جامعة الاسكندريه - كلية الهندسة - هندسة الحاسب والنظم

الفهرس

Only 14 pages are availabe for public view

from

169

from

169

Abstract

Faceted search is a new paradigm in the information retrieval systems. The new idea in faceted search from an ordinary category search is that the same result item can be assigned-to many different facets at the same time, not categorized under only one category.
To implement a Faceted Search System, a well defmed metadata structure for the searched items must exist; the structure indicates the set of facets associated with the searched items. This makes structured data very suitable items to be searched using Faceted Search.
Regarding text documents, such as news articles and scientific essays, are sometimes semi-structured. (fthe essay is written using a mark up language, or the document is stored in a structured environment, like a database containing enough metadata about the essay, then a straight forward Faceted Search System can be created for these documents.
Unfortunately, text documents are simply plain text, without any metadata to describe their contents. So searching text documents means that the user must go through all the result-set until he/she fmds the exact match of what he/she is looking for. This is the main purpose to start looking for methods to make text documents faceted search-capable.
A variety of methods for extracting plain facets and hierarchical facets from plain text are recently introduced. Methods like data clustering and documents classifications are not preferred, as they do not return user-friendly results. Newer methods are introduced, for facets extraction ITom text document, which take advantage of external lexical hierarchies. Examples of the tools used as external lexical hierarchies in the extraction process are the Wikipedia and the WordNet lexical database.
The Arabic language is not as established as the English language on the web. Meanwhile, the size of Arabic documents that can be accessed online is increasing every day. And the need to search such documents increases proportionally.
In our work, we introduce a Faceted Search System for unstructured Arabic text. The system searches for the user query and return a result-set as Google search would return. In addition, the tool returns a set of hierarchical facets terms to help the user simplify the returned result-set to reach the required document easier.
As the availability of Arabic processing tools is not as the availability of the English tools. We use two methods for building the facets hierarchy for the Arabic terms. The first method uses Arabic tool, which is the Arabic Wikipedia Hierarchy. The second method uses English tool, where the facets are translated to English and the hierarchy is built using the English WordNet IS•S hypernym structure. Then the whole facets hierarchy is back translated into Arabic.