WCSE 2015
ISBN: 978-981-09-5471-0 DOI: 10.18178/wcse.2015.04.044

A New Multilingual Stemmer Based on the Extraction of the Root

Said Gadri, Abdelouahab Moussaoui

Abstract— Stemming is a technique used to reduce inflected and derived words to their basic forms (stem or root). It is a very important step of pre-processing in text mining, and generally used in many areas of research such as: Natural language Processing NLP, Text Categorization TC, Text Summarizing TS, Information Retrieval IR, and other tasks in text mining. Stemming is frequently useful in text categorization to reduce the size of terms vocabulary, and in information retrieval to improve the search effectiveness and then gives us relevant results. In this paper, we propose a new multilingual stemmer based on the extraction of word root and in which we use the technique of n-grams. We validated our stemmer on three languages which are: Arabic, French and English.

Index Terms— Root extraction, stemming, information retrieval, bigrams technique, text mining, machine learning, natural language processing.

Said Gadri
Department of ICST, University of M’sila, ALGERIA
Abdelouahab Moussaoui
Department of Computer Sciences, University Farhat Abbes of Setif, ALGERIA


Cite: Said Gadri, Abdelouahab Moussaoui, "A New Multilingual Stemmer Based on the Extraction of the Root," 2015 The 5th International Workshop on Computer Science and Engineering-Information Processing and Control Engineering (WCSE 2015-IPCE), pp. 266-272, Moscow, Russia, April 15-17, 2015.