Search In this Thesis
   Search In this Thesis  
العنوان
Improving Opinion Mining in Social
Networks Data /
المؤلف
Ismail, Shaimaa Mahmoud Mohammed.
هيئة الاعداد
باحث / شيماء محمود محمد اسماعيل
مشرف / عربى السيد كشك
مناقش / سامى عبد المنعم الضليل
مناقش / حاتم محمد السيد
الموضوع
Information Systems. Computers and Information. Computer security.
تاريخ النشر
2020.
عدد الصفحات
97 p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Computer Science Applications
تاريخ الإجازة
1/1/2020
مكان الإجازة
جامعة المنوفية - كلية الحاسبات والمعلومات - علوم الحاسب
الفهرس
Only 14 pages are availabe for public view

from 97

from 97

Abstract

Opinion mining is the process of extract all user’s opinions in social networks data such as Facebook and Twitter. Extracting these opinions is a challenging task because of the huge number of daily posts which published in social media, Thus, researchers interact with this problem through the use of machine learning algorithms such as Support Vector Machines (classification and regression), Naive Bayes, Random Forest, Logistic Regressions, Maximum Entropy, and so on. Twitter is a rich source to learn about people’s opinions. So, it is important to extract this data in order to benefit from it as feedback about the user’s opinions on different published topics.
In existing work in both processes (classification and prediction), researchers used different datasets with different machine learning algorithms like Support Vector Machines, Naive Bayes, Random Forest, Logistic Regression, and Maximum Entropy but their approaches have some weakness, first, they used small dataset in their experiments.
Second, in the preprocessing phase, they did not apply all normalization steps for cleaning the data. Third, in the feature selection phase, they did not use both unigrams and bigrams as features. So these approaches need to be an improvement.
This thesis proposes a new approach to improve opinion mining is social networks data. This work is done in two main steps (Classification and Prediction). The dataset used consists of four mobile phone categories (Blackberry, iPhone, Lenovo, and Samsung), each category contains 34,000 comments about user’s opinion of each product, comments classified into positive or negative, and each comment labeled with the rate from one to five depending on the user feedback.
Our approach implementation passes by several steps, first, machine learning algorithms in English text classification and prediction such as (Naïve Bayes, Maximum Entropy, Logistic Regression, Random Forest Regression, and Support Vector machines) are used .Second, tokenization, stemming, and lemmatization are applied, all words are converted to lower case, usernames, mentions, links, repeated characters, numbers,
empty tweets, punctuations, stop words and more than two spaces between words are deleted . Third, all words like isn’t are converted to is not to clean the data and both (unigrams and bigrams) are used to extract the features from the data. In the classification process, our approach has an accuracy of 90%. In the prediction process, the SVR model is able to predict future products rate with a Mean Squared Error (MSE) of 0.4122, Logistic Regression model is able to predict with a Mean Squared Error of 0.4986 and Random Forest Regression model is able to predict with a Mean Squared Error of 0.4770. The thesis is organized as follows