الفهرس | Only 14 pages are availabe for public view |
Abstract The overall objective of this study is to maximize disease susceptibility prediction accuracy, given training set S and a test case T that does not occur in S.T is represented as a tuple of (known single nucleotide polymorphisms (SNP), unknown disease). DisGeNET is a proponent dataset in disease susceptibility research. This work reviews DisGeNET comprehensive information, before introducing a proposed system operating atop it. First, vetting the dataset by consolidation, removing genes with effects beyond a certain threshold. Second, computing the Empirical Cumulative Distribution Function (ECDF), using it for plotting and printing gene associations for many diseases such as, and not limited to, Alzheimer, anemia, brain and breast cancer. The operation of the proposed Classification System is clarified in view of the general operational framework, by Combining C4.5 and Decision Tree. The result of the accuracy measure for decision tree is 81.7% and Naive Bayes is 88% for Crohn Disease for instance, compared to Support Vector Machine (SVM) with 63.6%. |