WCSE 2017
ISBN: 978-981-11-3671-9 DOI: 10.18178/wcse.2017.06.012

The Optimization of Aho-Corasick Automaton with Large-scale Pattern Sets

Bo Sun, Yifu Li, Xiangzhan Yu, Shan Yao

Abstract— Multi-pattern matching is widely used in text retrieval, intrusion detection, information retrieval and many other areas. Aho-Corasick algorithm is a main method for multi-pattern matching for its high efficiency and stability. However, with the continuous expansion of the scale of the pattern set, Aho-Corasick algorithm faces the double test of memory consumption and the processing time. This paper summarizes several frequently used optimized methods of Aho–Corasick automaton, and proposes a new algorithm called BIT-AVL-AC, which is fused with the AVL tree and the bit vector. The experiments show that this algorithm provides a good trade-off between memory usage and processing speed with large-scale pattern sets.

Index Terms— multi-pattern matching, Aho-Corasick algorithm, large-scale pattern set.

Bo Sun, Yifu Li, Shan Yao
National Computer Network Emergency Response Technical Team/Coordination Center of China, CHINA
Xiangzhan Yu
School of Computer Science and Technology Harbin Institute of Technology Harbin, CHINA

[Download]


Cite: Bo Sun, Yifu Li, Xiangzhan Yu, Shan Yao, "The Optimization of Aho-Corasick Automaton with Large-scale Pattern Sets," Proceedings of 2017 the 7th International Workshop on Computer Science and Engineering, pp. 73-80, Beijing, 25-27 June, 2017.