ISBN: 978-981-09-5471-0 DOI: 10.18178/wcse.2015.04.044
A New Multilingual Stemmer Based on the Extraction of the Root
Abstract— Stemming is a technique used to reduce inflected and derived words to their basic forms
(stem or root). It is a very important step of pre-processing in text mining, and generally used in
many areas of research such as: Natural language Processing NLP, Text Categorization TC, Text
Summarizing TS, Information Retrieval IR, and other tasks in text mining. Stemming is frequently
useful in text categorization to reduce the size of terms vocabulary, and in information retrieval to
improve the search effectiveness and then gives us relevant results.
In this paper, we propose a new multilingual stemmer based on the extraction of word root and
in which we use the technique of n-grams. We validated our stemmer on three languages which are:
Arabic, French and English.
Index Terms— Root extraction, stemming, information retrieval, bigrams technique, text mining, machine learning, natural language processing.
Department of ICST, University of M’sila, ALGERIA
Department of Computer Sciences, University Farhat Abbes of Setif, ALGERIA
Cite: Said Gadri, Abdelouahab Moussaoui, "A New Multilingual Stemmer Based on the Extraction of the Root," 2015 The 5th International Workshop on Computer Science and Engineering-Information Processing and Control Engineering (WCSE 2015-IPCE), pp. 266-272, Moscow, Russia, April 15-17, 2015.