In recent years, e-commerce platforms and online stores have gained more
popularity among users, since these platforms provide their customers with the
ability of comparing the available products with the help of reviews written by
other users who are anyhow aware of the product’s quality. On the other hand,
companies and service providers can benefit from their customers' feedback to
identify their strengths and weaknesses, thereby increasing customer
satisfaction. But in the meantime, there are people who misuse this opportunity
and leave spam reviews for some products to promote or demote them,
intentionally. Therefore, the timely detection of these reviews and their
writers is essential in order to prevent harmful circumstances.
To date, there has been a lot of work in spam and spammer detection, but it
should not be neglected, that spammers are recently collaborating with each
other to form a spammer group so that they can control the emotions of a
particular product more naturally. Thus, in this dissertation, we deal with the
issue of malicious group detection, which has been addressed by fewer studies
compared to spam and spammer detection, although it is a more challenging
subject.
The purpose of this study is to design a spammer group detection system
using the content of user reviews and their meta-data. The proposed system
makes use of the SCAN clustering method which is suitable for graphs and
networks. It also applies preprocessing steps to distinguish genuine and fake
users by their behavioral characteristics, including review timestamp and
rating average, prior to performing clustering step to increase the quality of
detected groups and to reduce the time complexity of the main steps. In order
to rank the potential spammer groups and identify the real spammer groups, the
cosine similarity of reviews among group members is being considered.
Experiments on 3 real-world datasets extracted from the Yelp.com website
indicate the superiority of the proposed method, in terms of the precision and
F1 measure, over one of the newest methods available in this domain, namely,
GSBC.