HI-Tree: Mining High Influence Patterns Using External and Internal Utility Values
We propose an efficient algorithm, called HI-Tree, for mining high influence patterns for an incremental dataset. In traditional pattern mining, one would find the complete set of patterns and then apply a post-pruning step to it. The size of the complete mining results is typically prohibitively large, despite the fact that only a small percentage of high utility patterns are interesting. Thus it is inefficient to wait for the mining algorithm to complete and then apply feature selection to post-process the large number of resulting patterns. Instead of generating the complete set of frequent patterns we are able to directly mine patterns with high utility values in an incremental manner. In this paper we propose a novel utility measure called an influence factor using the concepts of external utility and internal utility of an item. The influence factor for an item takes into consideration its connectivity with its neighborhood as well as its importance within a transaction. The measure is especially useful in problem domains utilizing network or interaction characteristics amongst items such as in a social network or web click-stream data. We compared our technique against state of the art incremental mining techniques and show that our technique has better rule generation and runtime performance.