Search In this Thesis
   Search In this Thesis  
العنوان
Transfer learning for natural language processing in low-resource scenarios /
الناشر
Muhammad Emadeldien Ahmed Mahmoud Khalifa ,
المؤلف
Muhammad Emadeldien Ahmed Mahmoud Khalifa
هيئة الاعداد
مشرف / Muhammad Emadeldien Ahmed Mahmoud Khalifa
مشرف / Hesham Ahmed Hassan ,
مشرف / Aly Aly Fahmy
مناقش / Hesham Ahmed Hassan ,
تاريخ النشر
2021
عدد الصفحات
102 Leaves :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Computer Science (miscellaneous)
تاريخ الإجازة
2/10/2020
مكان الإجازة
جامعة القاهرة - كلية الحاسبات و المعلومات - Computer Science
الفهرس
Only 14 pages are availabe for public view

from 120

from 120

Abstract

Annotated data is necessary for supervised machine learning approaches. Unfortunately, data annotation is expensive, time-consuming, and requires domain expertise from the human labeler.Therefore, it is essential to develop methods that can operate in zero- and low-resource settings i.e., with no or little labeled data in the target task. In this work, we propose two transfer learning approaches based on inductive and transductive transfer.The inductive transfer approach leverages raw unlabeled data through pre-trained language models and obtains substantial performance gains on three natural language processing tasks, namely named entity recognition (NER), part-of-speech (POS) tagging, and sarcasm detection (SRD). However, the proposed inductive approach is only suitable when we have labeled data in the target language variety.Therefore, we shift our focus to zero- and low-resource settings where the goal is to build models that can generalize to completely unseen language varieties and frame our work in the scope of the Arabic language and three of its varieties (dialects), namely Egyptian, Gulf, and Levantine.Then, we develop a transductive transfer approach that allows transferringknowledge between different Arabic varieties without the need for labeled examples in the target variety. The proposed transductive approach enables knowledge transfer from resource-rich language varietiesto resource-poor ones and is based on self-training with unlabeled examples from the target language variety