الفهرس | Only 14 pages are availabe for public view |
Abstract Educational Data Mining (EDM) is a data mining field that aims to evaluate data and derive information from raw data obtained from educational systems. As other data mining systems, the EDM cleans and integrates raw data coming from different sources by choosing appropriate techniques for transforming and analyzing data. EDM systems apply a multitude of techniques and tools to predict and evaluate student performance. Educational data have more challenges in its distribution which is called the class imbalance problem. Because most of the datasets collected from the educational records are imbalanced by nature. Therefore, in this thesis, we handle the class imbalance problem by using SOMTE (Synthetic Minority Oversampling Technique). The students’ dropout rates are reducing in some courses among students in higher education institutions. So, we need to predict student performance to increase student success rates. Therefore, the main goal of this thesis is to develop two models for predicting student performance and recommend the student’s department. We used a real dataset of students’ records from the Giza Higher Institute for Management Sciences. The institute has three different departments are Information Systems, Management, and Accounting. As the Management Department has two departments are the Marketing department and the Finance department, and therefore we recommend a marketing or finance department to the students. We recommend the student’s department through using the classification techniques asJ48, Random Forest and Random tree, SVM, and Logistic Regression classifiers. To achieve the higher success rates of students, we predict the student GPA by using the regression techniques such as K-Nearest Neighbor, Linear regression, and Random Forest classifiers. The dataset contains 2869 student records, we used all student records to predict student performance with 14 features from all features. We used 750 student records from all student records to recommend the student department with 12 features from all features. We present a comparative analysis between classification and regression techniques before and after using SMOTE. Therefore, it was found from the results of the experiment that random forest was better than other classifiers. |