Search In this Thesis
   Search In this Thesis  
العنوان
Semi-supervised Language-independent Sentiment Analysis\
المؤلف
Hanafy,Mohammad Hassan
هيئة الاعداد
باحث / محمد حسن حنفي محمود
مشرف / حازم محمود عباس
مشرف / محمود ابراهيم خليل
مناقش / حسن طاهر درة
تاريخ النشر
2019.
عدد الصفحات
71p.:
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
الهندسة الكهربائية والالكترونية
تاريخ الإجازة
1/1/2019
مكان الإجازة
جامعة عين شمس - كلية الهندسة - كهرباء حاسبات
الفهرس
Only 14 pages are availabe for public view

from 96

from 96

Abstract

Sentiment analysis plays an important role in research and industry as extracting the opinions of people could be beneficial in several domains. Millions of active users ex- press their opinions and sentiments daily in blogs, social networks and different other platforms. Twitter allows users from all the globe to express their feelings and opinions freely in a unit of text called tweet. With millions of tweets get published daily, twitter has attracted many researchers and organizations to exploit its data.
Early works on sentiment analysis have used rule based approaches then machine learn- ing classifiers were introduced as it surpassed the former one, but most of these works have been built for a certain language or certain domain. Being a global platform that is used in almost all the countries creates new challenges to be faced. Users express their sentiments with different languages, tend not to use the formal language, do not stick to grammar rules, use slang words and new expressions are continuously added. that kept the door open for further innovations and systems to solve these problems.
In this thesis, we build a semi-supervised language-independent technique that does not depend on any feature of a certain language. It uses emoticons that is used heavily in twitter as heuristic labels to build the training set from raw tweets. Statistical and unsupervised approaches i.e bag of words and word2vec are used as feature representation for the classifiers.
Two main models are proposed in this work, both combine typical and deep learning classifiers i.e SVM, Max.Ent. ,CNN and LSTM. The first model used more core classifiers than the second one, and focused on tuning the combination of them to overcome their limitations. The second model used fewer classifiers but focused more on the feature representation, specially word2vec and how to make use of its models i.e skip-gram and continuous bag of words. The proposed models are very efficient regards memory and time as it used only 10% of training dataset compared to other approaches on the same test dataset. The results also show that both approaches are performant as they achieve the state-of-the-art accuracy of 86.37%.