## Subnettrees for Strict Pattern Matching with General Gaps and Length Constraints

### Subnettrees for Strict Pattern Matching with General Gaps and Length Constraints

Abstract: Pattern matching with gaps constraints has very important application in many fields such as information retrieval, computational biology, sequential pattern mining etc. This paper proposes a new pattern matching problem named Strict Pattern mAtching with geNeral Gaps and Length cOnstraints (SPANGLO) which has four aspects: it is a kind of strict exact pattern matching, any position in sequence can be used more than once, the pattern can have more than one general gap constraint and the occurrence has the length constraints. An instance of SPANGLO can be transformed into exponential solved instances with non-negative gaps in the worst case. In order to solve the problem effectively, an algorithm named SubnEtTreeSpanglo (SETS) is proposed based on Subnettree and its special concepts and properties. The correctness and completeness of the algorithm are given and the space and time complexities of the algorithm are O(m×MaxLen×W) and O(MaxLen×W×m2×n) respectively, where n, m, MaxLen and W are the sequence length, the pattern length, the maximum length of the occurrence and the maximum gap of the pattern respectively. Experimental results validate the correctness of the transforming method of SPANGLO and the efficientiveness and correctness of SETS.

### 子网树求解一般间隙和长度约束严格模式匹配

Problem: Given P=a[0,1]b[-1,1]c, S=s0s1s2s3s4s5=acbacbc and length constraints MinLen=3, MaxLen=4.
Without length constraints, T(S,P)= {<0,2,1>,<0,2,4>,<3,5,4>, <3,5,6>} and |T(S,P)|=4.
With length constraints, T(S,P,LEN)={<0,2,1>,<3,5,4>, <3,5,6>} and |T(S,P,LEN)|=3.

Algorithm :sets