NetNDP: Nonoverlapping (δ, γ)-approximate Pattern matching

NetNDP: Nonoverlapping (δ, γ)-approximate Pattern matching

Astract:
Pattern matching can calculate the support of patterns, which is a key issue of sequential pattern mining (or sequence pattern mining). Nonoverlapping pattern matching means that any two occurrences cannot use the same character of the sequence at the same position. Approximate pattern matching allows for some data noise, which is more general than exact pattern matching. At present, nonoverlapping approximate pattern matching employs Hamming distance, which cannot measure the local approximation between subsequence and pattern, resulting in large deviations in matching results. To tackle the aforementioned issue, this paper addresses Nonoverlapping Delta and gamma approximate Pattern matching (NDP), i.e. employ (δ, γ)-distance to study an approximate pattern matching, where the local distance and the global distance do not exceed δ and γ, respectively. We first transform the NDP problem into a local approximate Nettree. Then we construct an efficient algorithm, named local approximate Nettree for NDP (NetNDP). This paper proposes Minimal Root Distance (MRD). According to MRD, we can know whether a node has root paths satisfy the global constraint or not and prune invalid nodes and parent-child relationships. NetNDP finds the rightmost absolute leaf of the max root, searches the rightmost occurrence from the rightmost absolute leaf , and deletes the occurrence. We iterate the above steps until there is no new occurrence. Numerous experiments verify the correctness and effectiveness of the proposed algorithm.

Code
NetNDP
INSGrow-appro
NETLAP-(δ,γ)
NETASPNO-(δ,γ)
NetNDP-nonp

dataset
dataset