![]() | Only 14 pages are availabe for public view |
Abstract Textual data mining has a valuable role in modern computer technology, because of its huge contributions in business and digital economy. One of its important fields is Sentiment Analysis (SA) which called also Opinion Mining (OM). Sentiment analysis has presented precious benefits for decision makers for extracting the knowledge related users directions and industrial trends. It has worked on web textual data which is one the most important web contents media, such as social networks, weblogs, business web portals, .. etc. In SA, we had exploited a sort of computational intelligence for gaining text polarity, text polarity is an implied attitude of text writer. Polarity is determined by agree or disagree with the subject text written for. This agree/disagree form is expressed by positive/negative which means if the user accepts or doesn’t accept text content. This thesis had studied the Arabic language efforts on SA field. In this thesis, we presented four main contributions related to Arabic language sentiment analysis in the following two approaches of SA. These two approaches are Machine Learning SA approach (MLSA) which called in some literature corpus-based SA (CBSA), and Lexicon-based SA (LBSA) approach. The labeled set test of text records used in MLSA is called a corpus. The goal is to learn a model for gaining better accuracy and performance by comparing human annotations results with intelligent model based on sentiment special characteristics. A list of sentiment’s terms called sentiment lexicon. The sentiment lexicon is collected either manually or automatically by one of similarity measure techniques. Firstly, we had tested five states of sentiment’s corpus, which are extracted according to some NLP aspects. The focus was on Arabic-specific nature for improving sentiment analysis in Arabic language on MLSA approach. Then we had developed an algorithm exploited for enhancing the accuracy of corpus selected features these two contributions are done on two different pre-processing steps for the purpose of reducing computation cost of large amount of text selected features. Third and fourth contributions are used for LBSA approach. An adapted lexicon is developed at first, the adapting is done between two lexicon domains. Then, we had developed our method for constructing a new sentiment lexicon automatically. All these steps are done based on Arabic-specific aspects that we had utilized by a dynamic programming algorithm we named Root Based Words Find out (RBWF), which utilized in both sentiment lexicons building. |