WCSE 2015
A New Multilingual Stemmer Based on the Extraction of the Root

Said Gadri, Abdelouahab Moussaoui

Abstract— Stemming is a technique used to reduce inflected and derived words to their basic forms (stem or root). It is a very important step of pre-processing in text mining, and generally used in many areas of research such as: Natural language Processing NLP, Text Categorization TC, Text Summarizing TS, Information Retrieval IR, and other tasks in text mining. Stemming is frequently useful in text categorization to reduce the size of terms vocabulary, and in information retrieval to improve the search effectiveness and then gives us relevant results. In this paper, we propose a new multilingual stemmer based on the extraction of word root and in which we use the technique of n-grams. We validated our stemmer on three languages which are: Arabic, French and English.

Index Terms— Root extraction, stemming, information retrieval, bigrams technique, text mining, machine learning, natural language processing.

