ISBN: 978-981-11-3671-9 DOI: 10.18178/wcse.2017.06.012
The Optimization of Aho-Corasick Automaton with Large-scale Pattern Sets
Abstract— Multi-pattern matching is widely used in text retrieval, intrusion detection, information retrieval
and many other areas. Aho-Corasick algorithm is a main method for multi-pattern matching for its high
efficiency and stability. However, with the continuous expansion of the scale of the pattern set, Aho-Corasick
algorithm faces the double test of memory consumption and the processing time. This paper summarizes
several frequently used optimized methods of Aho–Corasick automaton, and proposes a new algorithm called
BIT-AVL-AC, which is fused with the AVL tree and the bit vector. The experiments show that this algorithm
provides a good trade-off between memory usage and processing speed with large-scale pattern sets.
Index Terms— multi-pattern matching, Aho-Corasick algorithm, large-scale pattern set.
Bo Sun, Yifu Li, Shan Yao
National Computer Network Emergency Response Technical Team/Coordination Center of China, CHINA
School of Computer Science and Technology Harbin Institute of Technology Harbin, CHINA
Cite: Bo Sun, Yifu Li, Xiangzhan Yu, Shan Yao, "The Optimization of Aho-Corasick Automaton with Large-scale Pattern Sets," Proceedings of 2017 the 7th International Workshop on Computer Science and Engineering, pp. 73-80, Beijing, 25-27 June, 2017.