The popularity of XML results in producing large numbers of XML documents. Thus, native XML databases are developed to store huge numbers of XML documents. However, most approach of association rule mining only can be applied to transaction databases. Therefore, to develop an approach of association rule mining on native XML databases is an important research.
Currently, the FP-growth based on an FP-tree algorithm performs more efficiently than other methods of association rules mining, but it cannot be applied to native XML databases. Hence, we adaptive an improving FP-tree algorithm called Frequent Pattern Split method, simply FP-split, for fast association rule mining from native XML databases.
The proposed FP-split method explores association rules of character data and tags in XML documents by parsing Data Type Definition (DTD) or XML schema. Unlike XQuery, FP-split method can easily aid users to extract important and complete information from XML documents without needing to understand both the structure of XML documents and their corresponding syntax.
In this paper, we prove that the FP-split method is time-efficient for mining association rules from native XML databases by experiment with various parameters, such as various minimum supports, different number of items, and large amount of data. In addition, we also implement a lot of experiments to show that our proposed method performs better than FP-tree construction algorithm in transaction database.