Author: Samar Ibrahem Youssef Ghareeb Zekrallah,/ Title: Using deep nets for visual question answering /

Search In this Thesis

العنوان

Using deep nets for visual question answering /

المؤلف

Samar Ibrahem Youssef Ghareeb Zekrallah,

هيئة الاعداد

باحث / Samar Ibrahem Youssef Ghareeb Zekrallah

مشرف / Aboul-Ella Otifey Hassanien

مشرف / Nour Eldeen Mahmoud Khalifa

مناقش / Aboul-Ella Otifey Hassanien

مناقش / Hesham Nabih ElMahdy

الموضوع

Information Technology

تاريخ النشر

2022.

عدد الصفحات

86 Leaves. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Computer Science (miscellaneous)

تاريخ الإجازة

11/7/2022

مكان الإجازة

جامعة القاهرة - كلية الحاسبات و المعلومات - Information Technology

الفهرس

Only 14 pages are availabe for public view

from

106

from

106

Abstract

VQA is a challenging research area where a model must be able to understand image
semantics along with the asked question in order to infer the correct answer. The ability of
a VQA model of generalization to new questions about new images that have not seen
before in the training stage is called zero shot capability and also there is a need for good
evaluation metrics to compensate for dataset bias. In this thesis, TDIUC dataset is
redistributed for this purpose to test this capability and apply good evaluation metrics.
Also, Using transformer models for vqa task takes long training time, substituting selfattention layers by FNet sublayers shows improvement to training speed by 24% and
testing speed by 12.7% with a limited accuracy cost by 5.61%.