ISBN: 978-981-18-7950-0 DOI: 10.18178/wcse.2023.06.004
Generating GAN-based fake embedding samples for Recommender Systems
Abstract—This paper proposes the use of embedding-based architectures to generate collaborative filtering synthetic datasets in the Recommender Systems area; these datasets are useful both for research and commercial purposes. Since raw collaborative filtering data is extraordinarily sparse, it makes sense to embed them before generative processing. This compression stage is expected to improve GAN performance and accuracy, but it also involves measuring the quality of the results before and after the necessary decompression stage. In between embedding and unembedding, it is convenient to test the ‘intermediate’ sample representations to ensure that they keep some convenient distributions, such as the ratings proportions: one to five stars, and their expected latent space layouts. Experiments have been run using two representative datasets and two generative models: regular and Wasserstein ones; results show an appropriate distribution of the generated fake profiles, opening the door to future works where the short, dense, and continuous generated embeddings will be translated to the long, sparse, and discrete fake samples that will fill the generated synthetic datasets.
Index Terms—synthetic datasets, generative adversarial networks, recommender systems, collaborative filtering, embedding
Jesús Bobadilla, Abraham Gutiérrez
Technical University of Madrid, ETSISI, SPAIN
Cite: Jesús Bobadilla, Abraham Gutiérrez, "Generating GAN-based fake embedding samples for Recommender Systems" Proceedings of 2023 the 13th International Workshop on Computer Science and Engineering (WCSE 2023), pp. 20-25, June 16-18, 2023.