Author: Gomaa, Wael Hassan./ Title: Automatic arabic essay assessment /

Search In this Thesis

العنوان

Automatic arabic essay assessment /

المؤلف

Gomaa, Wael Hassan.

الموضوع

Data structures (Computer science.

تاريخ النشر

2014.

عدد الصفحات

157 P. :

الفهرس

يوجد فقط 14 صفحة متاحة للعرض العام

from

180

from

180

المستخلص

The objective of this thesis was to discuss the nature and advantages of automatic scoring systems while proposing techniques that are capable of dealing with Arabic short answers. Results showed that the proposed techniques can be applied to a real educational environment. Briefly, these techniques calculated student marks automatically by measuring the lexical and semantic similarity between student and model answers. This thesis presents three case studies; the first deals with student answers of Data Structure course in English language while the other two deal with Environmental Science and Philosophy answers in Arabic language.
Texas short answer grading system started with an unsupervised approach which depended on bag of words and text to text approaches. It used a data set that covers the course of data structure in Texas university. The data set contained 81 questions and 2273 student answers. The proposed system improved Texas by measuring the similarity between model answer and student answer using String-Based and Corpus-based similarity measures. These measures were applied both separately and combined. The best correlation value 0.504 was obtained from mixing N-gram with Disco1 similarity values. The proposed model achieved great results compared to the original Texas work.
Environmental Science data set was created as part of the research, it contained 61 questions, 10 answers for each, with a total number of 610 answers. Many aspects were introduced that depend on translation to overcome the lack of text processing resources in Arabic, such as extracting model answers automatically from an already built database and applying K-means clustering to scale the obtained similarity values. The system scored each student’s answer with 536 different automatic runs: 256 of the runs used
String-Based Similarity, 64 used Corpus-Based Similarity, and the other 216 used Knowledge-Based Similarity measures. For each run, the Pearson Correlation Coefficient (r) and the Root Mean Square Error (RMSE) were computed. Combining the measures from different categories achieved r = 0.83 and RMSE = 0.75. These resulting values were very close to the values that were scored manually by two annotators.
Philosophy data set was also created as part of this research. It contained 50 questions with 12 answers per each with total number of 600 answers. Model answer for each question was divided to set of elements, each element may contain Section(s) and Sub Section(s) with certain mark for each. Fourteen String-Based and two Corpus-Based similarity algorithms were experimented through two models. The first model (Holistic Model) measures the similarity between the complete form of student answer and model answer without dividing the student answer and ignoring the partition scheme of model answer. The second model (Partitioning Model) automatically divides student answer into set of sentences using sentences boundary detection templates based on regular expression, then it maps each sentence to the highest similarity element of model answers. Partitioning model achieved better results than holistic model in all cases although simple sentence boundary detection templates were used. Combining multiple similarity measures enhanced both the correlation and the error rate values. An interesting research point was to benefit from the combination of the different similarity algorithms in reducing the total time required for measuring the automatic score to the one-sixth which is considered a real achievement. Also this combination paved the way to multithreading approach which accordingly decreased the elapsed time. Finally, providing students with useful feedback was introduced.