الفهرس | Only 14 pages are availabe for public view |
Abstract Abstract This study investigates the linguistic lexical choices made by 500 Egyptian Twitter users (250 males and 250 females) writing in MSA and ECA in a selected corpus of 30,000 tweets over the period 2012 to 2013. The study examines the validity of gender-based variations in computer-mediated discourse, and how this can help in authorship studies. Users are identified as males or females according to their names, alias or bio. Certain gender-preferential features, used in previous sociolinguistic and computational studies (e.g. the use of function words, words that denote insults, taboo words, intensifiers, interrogatives, etc.) are selected and applied to tweets. The research examines selected morphological, stylometric and sociolinguistic gender-based features. Perl programming language and bag of words (BoWs) model are used in running codes and representing documents as sets of words. Finally, statistical analysis is performed. On the morphological level, results show that the addition of ta ta’aneeth (the gender inflectional-suffix) to derived nouns and adjectives is a significant feature that characterizes female authors. On the stylometric level, it is revealed that the repetitive use of pronouns marks females’ style, while the recurrent use of demonstratives and prepositions marks males’ style. On the sociolinguistic level, results demonstrate that women tend to use insults and interrogatives more frequently, whereas males make recurrent use of taboo words and intensifiers more than females. Concerning authors’ choice of domains, results highlight that females prefer to talk about their bodies and life partners, while males prefer to discuss issues related to sports, economy, and politics, in addition to using more loanwords. |