Approximate pattern matching with gap constraints

Approximate pattern matching with gap constraints

Youxi Wu, Zhiqiang Tang, He Jiang, Xindong Wu. Approximate Pattern Matching with Gap Constraints. Journal of Information Science.2016, 42(5): 639–658  (PDF)  (Source codes)

 

Abstract. Pattern matching is a key issue in sequential pattern mining, with many researchers now focussing on pattern matching with gap constraints. However, most of these studies involve exact pattern matching problems, a special case of approximate pattern matching and a more challenging task. In this study, we introduce an approximate pattern matching problem with Hamming distance. Its objective is to compute the number of approximate occurrences of pattern P with gap constraints in sequence S under similarity constraint d. We propose an efficient algorithm named SONG (Single-rOot Nettree for approximate pattern matchinG with gap constraints) and based on the new nonlinear data structure Single-root Nettree. SONG uses some new concepts of Single-root Nettrees to solve the problem effectively. Theoretical analysis and experiments demonstrate an interesting law that the ratio M(P,S,d)/N(P,S,m) approximately follows a binomial distribution, where M(P,S,d) and N(P,S,m) are the numbers of the approximate occurrences whose distances to pattern P are d (0≤dm) and no more than m (the length of pattern P), respectively. Experimental results for real biological data validate the efficiency and correctness of SONG.

发表评论