الفهرس | Only 14 pages are availabe for public view |
Abstract With the continuous growth of XML documents, clustering of these documents has be¬come an active research area. This thesis proposes a novel technique that explores both the content and structure of XML documents for determining similarity among them. As the content and the structure of XML documents play different roles and have different impor¬tance depending on the use and purpose of a dataset, the proposed technique separates the content similarity process from the structure similarity process, and then uses appropriate weights to combine the content similarity and the structure similarity based on the type of XML documents. The proposed technique can be configured to target both rigorously struc¬tured fine-grained XML documents and loosely structured coarse-grained XML documents. It can also be configured to target both homogenous and heterogeneous XML documents. Several experiments were conducted to evaluate the accuracy and the scalability of the pro¬posed technique and to compare it with state-of-the-art techniques. The results show the effectiveness of the proposed technique. |