Author: Mohamed, Marwa Khairy Mohamed./ Title: Detecting and Filtering of Offensive Content in Online Social Networks /

Search In this Thesis

العنوان

Detecting and Filtering of Offensive Content in Online Social Networks /

المؤلف

Mohamed, Marwa Khairy Mohamed.

هيئة الاعداد

باحث / مروه خيرى محمد محمد

مشرف / طارق مصطفى محمود

مشرف / طارق عبد الحفيظ عبد الرحمن

الموضوع

Computer science.

تاريخ النشر

2021

عدد الصفحات

144 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

Computer Science (miscellaneous)

تاريخ الإجازة

23/9/2021

مكان الإجازة

جامعة المنيا - كلية العلوم - علوم الحاسب

الفهرس

Only 14 pages are availabe for public view

from

165

from

165

Abstract

Social networking sites are becoming the most prevalent and reacting medium for many users. Millions of individuals can publish content openly on social media like Facebook, YouTube, Twitter, and Instagram, regardless of the material type, such as messages, images, videos, or events. The majority of individuals use social media without stopping to think about these networks’ effects on our lives, whether positive or negative. However, along with useful and exciting content, it can be a source for abusive and harmful content and causing harm to others such as insults and cyberbullying.
Incidents of bullying and the exchange of inappropriate content, which sometimes incites hatred and racism, have increased through social networks in the recent period in the world, prompting the officials of these networks to withhold some of the views of users that can lead to confusion among users sometimes, as happened, for example, in The last US elections. These companies also give the user the ability to report any content he deems inappropriate.
This thesis addresses the problem of discovering and filtering textual content written in Arabic or English that may contain offensive, obscene or contemptuous words – bullying – racist and present in the posts and tweets of users of social networks, especially Facebook, with a focus on cyberbullying.
Cyberbullying has been defined in a variety of ways, but there is widespread agreement among researchers that it involves intentional, aggressive, and repeated actions among peers via electronic means.
In this context, a study was conducted to survey the opinions of some Facebook users in order to provide descriptive information about the user’s use and privacy, his knowledge of the Facebook reporting system for any offensive content, and his satisfaction with this system.
In addition, we prepare an in-depth analysis for the cyberbullying problem on social networks (Definition, Impact, Statistics and detection techniques). Then we review the detection of Abusive Language and Cyberbullying on Arabic content on social networks.
Since the dataset of abusive words used in the learning and classification process is unbalanced, the effect of using rebalancing techniques for bullying datasets across unbalanced social networks has been studied. To study the classification performance on the balanced cyberbullying datasets, four resampling techniques (namely, Random under-sampling, Random Oversampling, SMOTE, SMOTE+TOMEK) are used to rebalance these datasets. The impact of each rebalancing technique on the classification performance using eight well-known classification algorithms is examined. Our experiments showed that the performance of resampling technique depends on the dataset size, the imbalance ratio, and the classifier used. The conducted experiments proved that there are no techniques that will always perform better the others.
A new balanced dataset in Arabic on social media bullying based on bullying keywords was also generated to be used for cyberbullying and offensive language detection. To verify the effectiveness of the proposed data sets, nine machine learning algorithms were used.
Finally, a filter system based on Ensemble’s machine learning techniques was proposed to improve the accuracy of bullying detection and experiments were conducted to evaluate the effectiveness of its use in detecting bullying. It was applied to five data sets, three of which are in Arabic and two are in English. The results of the conducted experiments showed the efficiency of the proposed filtering system.