Author: Zayed, Yahia Mohmoud Ibrahim./ Title: Integrating Of statistical method with ordinal association rules for effective error identification in data sets /

Search In this Thesis

العنوان

Integrating Of statistical method with ordinal association rules for effective error identification in data sets /

المؤلف

Zayed, Yahia Mohmoud Ibrahim.

هيئة الاعداد

باحث / Yahia Mohmoud Ibrahim Zayed

مشرف / Moawwd El-Mikkawy

مشرف / Hesham Arafat Ali

باحث / Yahia Mohmoud Ibrahim Zayed

الموضوع

Statistics.

تاريخ النشر

2012.

عدد الصفحات

85 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

الرياضيات

تاريخ الإجازة

1/1/2012

مكان الإجازة

جامعة المنصورة - كلية العلوم - Mathematics

الفهرس

Only 14 pages are availabe for public view

from

Abstract

Traditionally, the use of data sets largely falls in two categories: retrieve of individual elements of data, or analysis (mainly using statistics methods) of the data as a whole. The advance of computer-related technology has brought the use of data to a new horizon. Not only can we handle a vast amount of data that was unthinkable in the past but new aspects and requirements of analyzing the data as well. This new direction of data analysis is called data mining that can detect the hidden regularity in a data set.
Data mining has many applications in the real world; For example, in basic science research, data mining has helped to rediscover scientific laws; and in applied research such as in bioinformatics, data mining techniques have been used to analyze human genes and discover hidden factors that cause cancerous diseases. Therefore, data mining can be defined as the process of identifying valid, novel, potentially useful, and understandable patterns in data. This complex process involves five phases: data selection, data preparation, data transformation, data analysis (data mining), and interpretations and evaluations. While previously most attention is on the analysis phase involving the mining of patterns from data, the other phases are also currently admitted considerably important to the success of the mining process.
Most mining algorithms presume that a cleaned and appropriately transformed data set is already available. This setting is not realistic in real world applications in which data are corrupted, noisy, and format incompatible. Data preparation plays an extremely important role in data mining, not only supports the mining phase, but also significantly influences the quality of the mined patterns. The researchers have proven that 75% of the total life cycle of data mining is spent in data preparing. In this thesis, our focus is thus on the preparation phase of KDD and especially on outlier detection and dimension reduction problems.
Two algorithms are proposed. The first one solves the outlier detection problem. The main idea of this algorithm is the combining a statistical method with ordinal association rules method. The algorithm was executed on three test data sets; the results promise that the proposed algorithm is more effective than the statistical algorithm and the ordinal association rules algorithm. The second algorithm solves the dimension reduction problem. In the second algorithm, an innovative idea is introduced by combining the principal components analysis (PCA) with discreet wavelets transform. Test results show that the proposed algorithm is 40-72% percentage more effective than the algorithms in current use.