Springe direkt zu Inhalt

Dominik Schlomo Moog:

Spam detection in crowdsourced ideation

Requirements

  • basic knowledge in machine learning
Academic Advisor
Discipline
Crowdsourcing, Software Development
Degree
Bachelor of Science (B.Sc.)

Contents

Context

Crowd Ideation is considered as a promising solution to collect creative ideas because this solution involves participants from different backgrounds and generates a large number of ideas. However, the main challenge is finding useful and innovative ideas. Moreover, allowing the crowd to freely generate ideas, opens the opportunity for some participants to provide dummy text. To tackle this problem, we defined in our model a number of quality gates that improve the ideation output. One of these gates is spam and duplicate detection.

Problem

Some MTurk workers can submit a copy paste text from wikipedia or enter single word, combining random words during idea generation.

Objectives

Carry out a study about the algorithm used in literature to detect spam and non-sense text

Possible procedure

Look for algorithms used to detect span (e.g. email)

Adapt or propose new algorithm to detecting such dummy text

References

Androutsopoulos, Ion, Georgios Paliouras, Vangelis Karkaletsis, Constantine D Spyropoulos und Panagiotis Stamatopoulos: Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach. März 2013.

Blanzieri, Enrico und Anton Bryl: A Survey of Learning-Based Techniques of Email Spam Filtering. Artificial Intelligence Review, 29(1):63–92, März 2008, ISSN 0269-2821, 1573-7462. http://link.springer.com/10.1007/s10462-009-9109-62020-01-15.